Abstract

In Data Mining (DM) projects, more specifically in the Data Understanding and the Data Preparation phases, several techniques found in the literature are used to detect and handle data quality problems such as missing data, outliers, inconsistent data or time-variant data. However, the main limitation in the application of these techniques is the complexity caused by a lack of anticipation in the detection and resolution of data quality problems. Then, a DM process model designed for the prior management of data quality was recently proposed. It has the distinctive feature of having linked the DM process and the Software Engineering (SE) one by combining them in parallel. However, authors of that work [1] have just specified what should be done, not how it should be. The present research work is an improvement of that DM process model. It adds to it a methodology that indicates in a concrete way a guideline on how to combine the SE process and the DM one to anticipate and manage data quality problems that can be found during the mining process. This work will specifically address the case of temporal data. The main contribution of this methodology is the definition, in concrete terms, of how to anticipate and automate all activities necessary to remove temporal data quality problems in a mining process.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.