I got asked a question the other day about data cleansing from a relatively small organisation, who is wondering about whether or not to use a consultant and if so, how. I figured others here might also be facing this issue so I thought it was worth reposting my answer here.
I think the first thing I'd say is to make sure you properly listen to the data experts within your team. It's amazing how many managers/teams/organisations don't really do this. Guess how well that ends?
There are two issues in particularly to engage with your team on, and I'd really recommend discussing them separately to begin with.
The first is to really get to the heart of what data cleansing is required. The term "data cleansing" covers a multitude of sins and the issues are complicated because no large system will ever be 100% clean. You need to draw out what the specific issues are and get some kind of of order of significance. For example:
- Is it that there are multiple records per person? or...
- Is it that address data is old and out of date? or...
- Is it the formatting of data poor? or...
- Are there are lots of mistakes such as badly typed email addresses; or...
- Is it that the underlying structure is all wrong for what you are trying to achieve; or...
- Is data stored inconsistently – in different places for different records. etc.
Get a really good sense of that, the size of each issue and the degree to which it's stopping you to reach your goals before choosing / talking to a consultant.
The other issue to engage them on is how they feel about the task of taking on this data cleanse versus getting someone else in. Would they welcome the help or is this something which if other demands on them were to be stemmed for a while, or with a bit more resource, temps etc. they could execute themselves? I think these things are best discussed separately, at least initially, because for me at least, the concerns about the second set of questions can hamper getting to bottom of the first question.
The other thing which is also worth thinking about both with your team(s) and any potential consultant is how will you shut the back door? That is, how will you ensure your data doesn't get messy as quickly again? Again some questions to ponder:
- What needs to change to prevent your pristine newly cleansed system from slipping back into its current messy state?
- Do there need to be investments with how data is collected?
- Could some processes be better automated or systematised?
- Do you need some standard agreed & documented policies for how data is entered, a data dictionary etc.?
- Do you need to have a regular spring data clean to address the problems before they build up?
In terms of choosing a consultant, again I'd say involve some of your team in the process. In any eventuality they'll have to work closely together and they may also have key insights to differentiate between the snake-oil salesmen and the real experts. Also do your due diligence around how any potential consultant's is ensuring data security. For example ask how data will be transferred, when they'll dispose of it etc. But that's a question for another blog.
No comments:
Post a Comment