If you are reading this entry, is because you need to improve your data. But, are you clear about what do you need? Do you know your gathering sources? Is your data reliable?
As you may see, there are several questions that you need to answer before treating your data.
Are you clear about what do you need?
In the world of data quality, there are several criteria for treatment or improvement, and in order to apply each criterion, you must take into account two fundamental factors:
- Starting quality of the data that is about to be treated
- Final quality you want to achieve
Depending on the reliability of the initial data and what you want to achieve, you must consider using concrete methods to reach your goal. Therefore, the most important thing is to be sure of what you want to achieve.
Let’s pose a couple examples:
- Do you have a mail and you want to make a commercial action?
You must make sure that the addressee of the mail can receive your information, or the conversion will be disastrous.
- Do you know your client’s name, but you do not know their gender?
You must analyze their name and assign them a gender in order to segment your subsequent actions.
As you can see, all this depends on the initial data and what you desire to achieve.
Do you know your data gathering sources?
Knowing the origin of your data allows you to establish an origin reliability index. The less reliable is the origin, the less quality your data will have.
Besides, you must take into account that your gathering methods completely define your data reliability. That means the more strict you are in regards to your data gathering, the fewer problems you will have at the moment of taking advantage of them.
Analyzing all your data origins allows you to know beforehand what flows are less worked on, and as a consequence, what flows are less strict, which will cause the reception of less reliable data from the origin.
It is ideal to ake control of all your origins, and to apply the most strict and adaptable criteria for your business, taking into account the future use you will have of these data.
Is your data reliable?
Visually, it is easy to detect the values that do not obey certain rules:
- An e-mail that does not have a correct format
- A postal code that does not have the required length
- A name that was incorrectly written
At the same time, there are other signs of reliability that cannot be seen at first glance, such as:
- The domain of a e-mail address cannot receive mails
- The Postal Code does not exist
- The address does not exist
To detect the reliability on the latter cases, we must use specific tools that are able to detect when a value does not accomplish the minimum rules to be useful for your business.
Quality is a key aspect of any database, and without it, your business has no future.