Abstract

AbstractData quality intelligent detection feature extraction method was studied in the paper. The text segmentation model, word clustering, similarity calculation and other methods were applied to the treatment of data asset list, Data quality detection feature key word library and data asset feature list were generated, and then data quality detection was performed. The data knowledge in the data asset list was firstly used to extract the data characteristics and precipitate the business knowledge. Besides, the method adaptability was firstly studied base on different data type. Moreover, general data quality detection was carried out intended for a large number of discrete data in this work. The results showed that, the efficiency was improved by automatically data feature extraction based on data asset list other than manual works. And the shortage of incomplete statistics and insufficient accuracy of feature extraction was covered. In addition, the generality of data quality detection was furtherly improved and, the blind scanning range of data quality detection was reduced, leading to significant improvement of the efficiency and the accuracy of data quality intelligent detection.KeywordsData assetWord segmentationFeature

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call