Research on Feature Extraction Method of Data Quality Intelligent Detection

Weiwei Liu,Xiaokun Zheng,Shuya Lei,Xiao Liang

doi:10.1007/978-3-031-11217-1_27

Abstract

AbstractData quality intelligent detection feature extraction method was studied in the paper. The text segmentation model, word clustering, similarity calculation and other methods were applied to the treatment of data asset list, Data quality detection feature key word library and data asset feature list were generated, and then data quality detection was performed. The data knowledge in the data asset list was firstly used to extract the data characteristics and precipitate the business knowledge. Besides, the method adaptability was firstly studied base on different data type. Moreover, general data quality detection was carried out intended for a large number of discrete data in this work. The results showed that, the efficiency was improved by automatically data feature extraction based on data asset list other than manual works. And the shortage of incomplete statistics and insufficient accuracy of feature extraction was covered. In addition, the generality of data quality detection was furtherly improved and, the blind scanning range of data quality detection was reduced, leading to significant improvement of the efficiency and the accuracy of data quality intelligent detection.KeywordsData assetWord segmentationFeature

Full Text