An Earthquake Emergency Web Data Cleaning and Classification Method Based on Word Frequency and Position Weighting.

Shuai Liu,Wenchao Lv,Meng Huang,Chenxi Li,Zhonghao Wang

doi:10.1155/2022/6555392

Abstract

The speed of earthquake emergency web document data cleaning is one of the key factors affecting emergency rescue decision-making. Data classification is the core process of data cleaning, and the efficiency of data classification determines the speed of data cleaning. This article is based on earthquake emergency Web document data and HTML structural features, combined with TF-IDF Algorithm and information calculation model, improves the word frequency factor and location factor parameters, and proposes the weighted frequency algorithm P-TF-IDF for earthquake emergency Web documents. To filter out less frequent words and optimize the FastText model, N-gram Feature word vectors effectively improve the efficiency of Web document data classification; for text classification data, use missing data recognition rules, data classification rules, and data repair rules to design an artificial intelligence-based earthquake emergency network information data cleaning framework to detect invalid data sets value, complete data comparison and redundancy judgment, clean up data conflicts and data errors, and generate a complete data set without duplication. The data cleaning framework not only completes the fusion of earthquake emergency network information but also provides a data foundation for the visualization of earthquake emergency data.

Full Text