Build your own Data quality framework part 1/3

This time I would like to talk about building your own simple Data quality framework using T-SQL. This framework will be implemented as a SQL Agent Job evaluating the Data quality queries against your clients CRM Database and persisting the results from the last 2 runs in a separate DB schema with some front end for Data quality issues fixing and reporting. This solution will also be generic ( not dependent on column names ) so you can combine all kinds of Data quality rules the business can think of.

This tutorial will have 3 parts. In the first part, I would like to give an high perspective overview of what we are building and how this Data quality tool can work. The second part will focus on a high level overview of DB back-end steps needed and the third part will focus on the Data quality fixing and reporting part.

Let me start with saying, that Data quality is absolutely critical when it comes to a group of people entering data into a system. I have a lot of experience from an environment, where there was an internal CRM / ERP system being developed. Issues in Data quality would really be business-stoppers back there ( and probably anywhere else ). Imagine that you would like to invoice your client, but someone from the back-office team had inserted incomplete address or you have multiple addresses assigned to the client and you don’t know which one should be used for invoicing. And trust me, managers who cannot invoice their clients are not the right people to mess with. I am pretty sure that you can come up with a plenty of issues like this.

With the solution I am about to present, you don’t have to buy an pricey 3rd party application to solve these kinds of issues. So lets go through the possible scenarios.

1.Business Analyst knows exactly what is needed to ensure the required data quality at the clients side and manages these rules written in SQL language in an API he is provided with. This API could be implemented as a simple stored procedure used for insert / delete / update based on the parameters provided.

2.CRM Manager comes up with rules needed to have the required data quality in the client application. Let’s suppose this person does not know how to write SQL code, so these rules are handed over to the Business Analyst as verbally described definitions ( i.e. 1 contact can have multiple addresses but only one address can be used for invoicing ) and the Business Analyst translates these rules into the SQL language for example like this:


SELECT Contact_ID, ContactName, ContactOwner_ID, ContactOwnerName, AddressType, COUNT(*) AS [InvoicingAddressCount]
FROM dbo.Contact WITH (NOLOCK)
WHERE AddressType = 'Invoicing'
GROUP BY Contact_ID, ContactName, ContactOwner_ID, ContactOwnerName, AddressType
HAVING COUNT(*) > 1

and passes them into the API.

3.You could also have the DB developer come up with these rules and implementing them. ( definitely not my preferred scenario )

Untitled Diagram

4.Once the rules are in place, the DB job is scheduled and started and the rules are already being evaluated against the CRM Database, there will be a need for 2 front ends on the other side. One front end will be used for the Data quality issues fixing by the CRM users and the second front end will be used for reporting to Management. Seeing how well do your employees manage their business objects Data quality is an interesting KPI.

Next time, I will talk more about the DB project implementation.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s