Data Purity and beating the RAP

Recently we had Microsoft Risk Assessment Program (RAP) completed on some of our SQL Server instances. This is where Microsoft deploys some software that performs a similar function to when you go to the doctor for an annual physical. There’s a lot of poking and prodding, they draw samples, and at the end they tell you everything that is wrong. Your servers have high blood sugar, your databases need to take up yoga, etc. It’s pretty much everything except having the MS technician stroke his chin, saying “Hmmm.”

You are going to a doctor for a regularly scheduled checkup, right? Ok then.

Now, I can see how this whole RAP process might be troubling, because it’s very possible to consider this as a deal where your company is paying Microsoft to tell them EVERYTHING YOU ARE DOING WRONG. But what I found is that, for the most part, if you’re a competent DBA and already doing the critical stuff like making regular backups and running DBCC CHECKDB, then you should be fine.

Wait, you are making backups and running integrity checks, right? Ok then.

So it was with more curiosity than fear that I reviewed the results of our RAP, which came back for the most part as expected. I say for the most part because I saw a big red flag that one of our databases was in need of a “Purity Check.”  And I thusly pondered that, stroked my chin, and declared “Hmmm.”

Now if you’re like me, you cooked a chicken dinner with a caper cream sauce last night. Which would be quite the coincidence. But what might be a more likely coincidence is that you think “I am running DBCC CHECKDB on all of my databases, so I shouldn’t have any issues with Purity Checks.” And like me, you couldn’t be wronger, unless of course you write using imaginary words like “wronger”.

I did a little digging and here is what I found out from a site named “SQL CoPilot”:

In versions of SQL Server up to 2000, it was possible for data stored in columns of certain datatypes (see below) to contain out-of-range values.

The most common reason for this was normally the use of bcp to bulk load data, which didn’t carry out comprehensive range checks on the source data.

This issue was resolved in SQL Server 2005, and an option was added to the DBCC CHECKDB command so that this out-of-range data can be detected.

What this meant is that I had some 40 databases that had been created in a version prior to SQL Server 2005, but had been moved to more recent versions of the product. So yes, I am supporting a lot of old systems. In many ways, it’s like bingo night with the databases around here.

Armed with my factoid I busted out my Central Management Server and, with more fear than curiosity ran the following script to find out exactly HOW MANY databases I had that needed a Purity Check. The answer: 40. And then I sighed and broke out my facepalm.

SET NOCOUNT ON

DECLARE @DBINFO table (
ParentObject nvarchar(100)
, ObjectName nvarchar(100)
, Field nvarchar(100)
, ValueName nvarchar(100)
)

DECLARE 
	@DBName sysname
	, @DBId int
	, @Version int
	, @PurityFlag tinyint

SET @DBId = (Select MIN(dbid) from sysdatabases where dbid > 4)

WHILE EXISTS (select 1 from sysdatabases where dbid = @dbid) BEGIN

	SET @DBName = (select [name] from sysdatabases where dbid = @dbid)

	INSERT @DBINFO
	EXEC ('DBCC DBINFO(''' + @DBName + ''') WITH TABLERESULTS')

	SELECT @Version = convert(int, ValueName)
	FROM @DBINFO
	WHERE Field = 'dbi_createVersion'

	SELECT @PurityFlag = convert(tinyint, ValueName)
	FROM @DBINFO
	WHERE Field = 'dbi_dbccFlags'

	IF @PurityFlag = 0
		SELECT @DBName 'DBName', @Version 'Version', @PurityFlag 'PurityFlag'

	DELETE FROM @DBINFO

	SET @DBId = (Select MIN(dbid) from sysdatabases where dbid > @DBId)

	END

Now, if you’re in a similar situation, there are two pieces of good news. The first is that the lack of having a Purity Check run does not mean you automatically have out-of-range values. It just means you haven’t CHECKED for them. The second bit of good news is that once you have run a Purity Check, a bit flag gets set in the databases, attesting that you have checked, and henceforth you never have to check again. Assuming you’re using at least SQL Server 2005, that is.

My next step was to fire up a spreadsheet to hold my 40, and I don’t mean a Mickey’s. I methodically went to each database running the following, looking for those three reassuring words “Commands completed successfully.”

DBCC CHECKDB ('YourDatabase') WITH DATA_PURITY, NO_INFOMSGS

And I got them for nearly all of my 40 databases. Nearly all.

Yes, friends, one lone database turned out to have out-of-range of values in one lone table. I queried the table and found it was. . .wait for it. . .one lone record. I hope this doesn’t violate my NDA, but in all of our many, many terabytes of data we had an out-of-range issue with just one record, which had a phone number of  “-371388542112103”. I’m no data scientist, but that looks like garbage data to me. Long story less long, I broke out my NULL hammer and took care of it.

So now, if you have databases older than YouTube and your employer is considering a RAP run on your databases, you know what to do.

Leave a Reply

Your email address will not be published. Required fields are marked *