Research and Application of Statistical Method of Data Reduction Based on Empirical Distribution

Jun Sun

doi:10.23977/ferm.2021.040606

Abstract

Data reduction is used to obtain the reduced representation of the data set, which is smaller than the original data, but still maintains the integrity of the original data approximately. Mining on the reduced data set will be more effective and produce the same or almost the same analysis results. A continuous multivariate coupled distribution estimation algorithm with arbitrary distribution is proposed. The distribution is estimated from samples by empirical distribution function, and new individuals are generated by sampling. Secondly, the idea of clustering is introduced into data reduction, and a time dimension reduction method based on clustering is formed. The basic idea of this method is to cluster the time dimension of time series data. In order to verify the feasibility of the two new methods proposed in this paper, a set of simulation experiments are designed in this paper, and representative data are used for data reduction respectively. Experiments show that the two data reduction methods proposed in this paper can not only effectively reduce the amount of data and achieve the purpose of data reduction, but also improve the classification accuracy and have strong practicability.

Full Text