Abstract

The history of statistical data analysis is old, it goes back to the 1920’s. Many fundamental concepts of multivariate statistical data analysis, especially pure theoretical notions, have been accomplished by the 1950’s. After the 1960’s, the practical applications of multivariate statistical data analysis have been available, coupled with the progress of computers, and these have also been an affect on theoretical considerations.The basic process of data analysis is given as follows: p1). An objective of data analysis is given. p2). The data which seems to be closely connected with the objective is observed. (sampling data) p3). Constructing a model (or a set of models) for explaining the variation of the data. p4). Preprocessing (or transforming) the original data in order to make consistency between input data and the model. p5). Identification of the model based on observed (input) data. p6). Evaluate a goodness of fit. If the goodness of fit is insufficient, then return to P2) or P3), else go to next process. p7). Interpretation of the result and investigate the validity. The most different point on “data mining” and statistical data analysis seems to be the concept of “Data”. In data mining, the data is given as a database in advance. But, in statistical data analysis, the data is observed according to the objective of the analysis.On the other hand, the object of “data mining” is to find the effective (or valuable) information in the data. From the framework of statistical data analysis above, the main processes of data mining are p3), p4) and p5). However, the concept of “efficient information” in data mining is different from the main part of the data variation in statistical data analysis. For instance, in principal component analysis, the main part of the data variation is obtained as the first principal component, which has the largest proportion. But in data mining, the major variation of the data is of no interest, because the knowledge obtained from it is trivial. Then, data mining seems to be interested in the principal components with small proportion in order to get unusual but valuable information. Hence, statistical data analysis for residual data which is removing the main part of the data variation from the original data, will be useful for data mining.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call