A methodology for prior management of temporal data quality in a data mining process

Mouhamed Diop,Ibrahima Fall,Alassane Bah,Mamadou Samba Camara

doi:10.1109/isacv.2017.8054906

Abstract

In Data Mining (DM) projects, more specifically in the Data Understanding and the Data Preparation phases, several techniques found in the literature are used to detect and handle data quality problems such as missing data, outliers, inconsistent data or time-variant data. However, the main limitation in the application of these techniques is the complexity caused by a lack of anticipation in the detection and resolution of data quality problems. Then, a DM process model designed for the prior management of data quality was recently proposed. It has the distinctive feature of having linked the DM process and the Software Engineering (SE) one by combining them in parallel. However, authors of that work [1] have just specified what should be done, not how it should be. The present research work is an improvement of that DM process model. It adds to it a methodology that indicates in a concrete way a guideline on how to combine the SE process and the DM one to anticipate and manage data quality problems that can be found during the mining process. This work will specifically address the case of temporal data. The main contribution of this methodology is the definition, in concrete terms, of how to anticipate and automate all activities necessary to remove temporal data quality problems in a mining process.

Full Text