Do you have incorrect values in your database and you are not clear about what do you need to do in order to to fix them? In this post, we provide you with a guide on how to easily solve this problem.
What do we understand as an incorrect value?
A value of a field is considered to be incorrect when it does not correspond to the values allowed for the field in question.
For example, a value of the “municipality” field is incorrect when the existing value:
- Does not correspond to any of the allowed municipalities (a non-existent municipality at Spain, for example).
- Contains some typing error at the moment of saving the value (instead of “Madrid”, it has saved “Mardid”).
All these mistakes can be rectified by applying the relevant debugging and validations before or after the capture process.
When do we need to review the existing data?
To determine if we have to fix the value of a field, it is fundamental to know if the value is essential for some action of our business.
For example, if we need to execute:
- An e-mail marketing action. In that case, the e-mail field will be fundamental, and therefore, we will check that there are no errors in the existing values.
For example: [email protected] or [email protected] would correspond to [email protected]
- A communication campaign by email, phone or postal address. In our case, we need to confirm that the contact name is correct.
For example: “Juan Maneul” would correspond to “Juan Manuel”.
- A postal shipment. We do not have all the data of the postal address, or we have not correctly completed it, due to errors at the time of the user registration.
Example: “maiorca street 120, BCN” would correspond to “Mallorca Street, 120, 08036, Barcelona”.
Therefore, it is very important to be clear about which fields are fundamental for our business. Once the affected fields are clarified, we have to carry out a review (or profiling) to be clear about which number of records will be affected at the time of the review.
How can we avoid errors in our fields?
To avoid problems regarding invalid values, we have two options:
- Avoid the entry of non-valid data, applying strict business rules during capturing.
- Cleaning the data once they are captured.
The first option is recommended when you want to limit possible errors in the database.
In some cases, it may be interesting to apply the minimum validations in those non-critical fields and apply more strict criteria when the user wants to save the fields that can be used in future actions (such as zip code, address… ).
What techniques can we apply in order to correct values in our base?
Currently, there are several techniques available to detect and fix any error, such as:
- Patterns or regular expressions:
It allows detecting if the values of the fields comply with certain rules, and to delete those that are not accepted.
For example, for phones we shall only accept digits.
- Levenshtein distance:
It allows assigning nearby values, taking into account the number of different characters between the expected value and the correct value.
For example, for the value “Jaun” we would assign the value “Juan” (a character of difference between the correct and origin values).
- Fuzzy Logic:
It allows assigning correct values during entry, applying phonetic algorithms (there are several algorithms: metaphone, double metaphone, Soundex, Caverphone, …).
For example, hotmail.com o htmial.com correspond to HTMLKM, with which we would assign hotmail.com as the correct value for all cases.
It allows assigning a correct value from a limited set of accepted values. Every unidentified new entry assumes a correspondence in the dictionary.
For example, it is applicable to changes in street names, or dialect variants of the same street (Conde Urgell, Compte d’Urgell, …).
As you may see, there are several options to solve the detected errors on our fields, and it all depends on the field to correct and how strict we want to be.
Besides, it is very common to combine the aforementioned techniques to solve complex problems of incorrect values.
How can UProc help you in the inspection?
UProc was born with the goal of providing all the necessary utilities to debug, validate, enrich and unify all your data sources in a centralized and categorized catalog, with multiple data families and typologies.
In the case that you want to clean or review the values of your data sourcesCatalog, you can access the and follow these steps::
- Type the name of the field you want to correct in the Catalog searcher and press Enter.
- Click on any of the found tools, such as:
- Once in the tool, you can verify any value of your data sources, or consult the available examples, clicking on “Try Now”.
After verifying the value of the service, you can see the result of your request below the entry data, in the “Results” section.
To easily integrate you can use our API. If you click the link “Show API information” you may find examples in multiple programming languages that your technical team can use to accelerate the integration process with UProc.
Can I clean multiple values with UProc?
To clean multiple emails you have to register and use the Processing Assistant (uploading a file –Excel or CSV-) or to make a call to the API.
Do you need a better understanding of what the debugging and/or normalization entails?
You can read our entry aboutHow to treat your data: Formatting , to answer your questions about the data cleaning process.
Besides, we are at your disposal from the chat or the contact form, to address any question or query you might have about the service.