Abstract

Data can be viewed as a product. The quality of data can be assessed as the quality of a product which can be evaluated by multiple dimensions. Traditional data quality only refers to accuracy in data collecting system. In the AI age, the accuracy is no longer the only dimension for measuring data quality. Data quality has been expanded into sophisticated comprehensive concept. For the same set of data, different users may have different standards for the data quality. The high quality of data is a prerequisite for effective modeling and obtaining value from data. A data quality assessment matrix composed by multiple dimensions can measure the data quality subjectively and objectively. For a particular application, the data quality assessment matrix should reflect the degree of satisfaction of the data. In this study, a data assessment matrix for imbalanced multivariate time series data from complex manufacturing process is designed to measure the data quality quantitatively. Multiple sensors are placed along the machines and the process for collecting signals about the manufacturing process can be viewed as a data production process. Deep learning methods can be applied to the data to monitor the machines and predict the paper break failures. A control chart is designed for the data production process to control the data quality and help to improve the quality of training dataset for deep learning models. The control chart is more directly than hierarchical clustering result. The result of assessment and the control chart could help the data users to understand the data deeper and help build better models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call