Register for the 5th NLP Summit, a Free Online Conference on Sep 24-26. Register now.
was successfully added to your cart.

Data Quality as a Crucial Part of DataOps  

Data quality is an aspect of data operations that should definitely not be overlooked. The amount of data being generated and analyzed today continues to increase, and it is important to realize that analysis completed on poor quality data will produce flawed results. Quality data is a concern in many fields from supply chain management (SCM) to healthcare. Poor quality data leads to erroneous results, which in turn can lead to the wrong decisions being made. This then can result in increased costs in the future.

In fact, it has been estimated by the Data Warehousing Institute (TDWI) that poor quality data costs businesses as much as $611 billion a year. The other issue is that data entry errors and missing data may not be discovered right away and organizations may not fully understand the impact that such poor quality data can have in the long term. In the business world, customers may be lost which then results in a loss of future sales and referrals and in the healthcare systems patient treatment may be seriously compromised.

It is important to also realize that the most useful data is that which can be used for more than one purpose. For instance, patient data can be used for ascertaining what treatments to use for a particular patient, and can also be used at a later date as part of a research study on disease trends or on treatment effectiveness.


How do we ensure quality data?

The first step is the collecting of the data. This needs to be done with care and data needs to be recorded correctly. In a healthcare setting this means that a provider needs to ask the appropriate questions to gain the most complete information from patients as possible. Lab technicians and other personnel also need to be diligent about recording information in the patient record. The same vigilance is needed when entering data in other areas of study, whether this is climate data, epidemiological data or data from a business. All data must be entered correctly and must be checked for accuracy.

Individuals responsible for entering data need to be properly trained to ensure accuracy. It is also important that these personnel check their work often to make sure they are not making mistakes. The importance of accurate data entry needs to be emphasized during training so that people are aware that consequences can be severe.

Domain experts are also needed to facilitate data quality. Such experts should be professionals within their respective field, individuals who have the knowledge and understanding of what information is important and useful for making decisions and for providing insight. For example, with healthcare data experts should include medical doctors who can provide the expertise that is necessary.

Data validation is an important part of checking that data is of high quality. A lengthy validation process is important and necessary if we are to have faith in our data. Validation should follow a set approach to ensure that data is of a sufficiently high quality. Features such as data completeness, uniformity and plausibility should all be checked during the validation process. All aspects of data need to be checked including the metadata and units that are used. This also becomes important when data are to be combined into datasets.

Benefits of quality data

There are numerous benefits to having quality data. For one thing it means that costs are kept down and that more confidence can be placed in results gained from data analysis. In the business world this means that customers will have faith in a company that has accurate data. It is easy to understand, for instance, that customers will become infuriated if they receive bills that are incorrect. Having good customer relationships is crucial and obviously important if businesses are to be successful and remain operational. Happy customers are likely to lead to increased future sales and increased referrals to new customers.

In the healthcare arena the benefits of good data are also great. Having accurate and complete data on patients will allow doctors to design the best treatment options for their patients. Patient data collectively can also be used to indicate trends that inform decisions of hospital managers. For instance, knowing when an emergency room becomes very busy could enable managers to decide to put more people on duty during those times. The other advantage to good healthcare data is that it can be used to show epidemiological trends and be used by researchers who are completing studies. Quality data will ensure that research results are valid.

Decision-making relies on data and regardless of what area of study the data is in, the decisions made will only be valid if based on good data. Quality data should lead to better decision-making by government and non-government organizations.


Poor quality data is very costly, often costing companies billions of dollars every year. The problem too is that erroneous data is not always noticed. This leads to problems down the line with businesses losing revenue as their reputation is negatively impacted. Poor data in EHRs can lead to incorrect treatments being given to patients in the healthcare setting. Incorrect data can also result in problems for managers since it can result in the wrong decisions being made.

Data needs to be of high quality to start with in order to ensure that analysis of data and interpretation of results is valid. To begin with, data needs to be entered accurately and personnel need training on how to do this correctly and how to check their work for mistakes. Domain expertise and data validation are all beneficial in helping ensure the quality of the data. John Snow Labs employs domain experts to ensure this and they use a very long data validation process. Good quality data is essential for DataOps and this is an aspect that is recognized by John Snow Labs as being a very important part of data operations.

Auto-driving Cars Datasets

Through a quick look into autonomous vehicles world, we can simply realize that incidents and events are happening very fast and synchronously...