Perspective on Data Mining from Statistical Viewpoints

Yoshiharu Sato

doi:10.1007/3-540-45571-x_1

Abstract

The history of statistical data analysis is old, it goes back to the 1920’s. Many fundamental concepts of multivariate statistical data analysis, especially pure theoretical notions, have been accomplished by the 1950’s. After the 1960’s, the practical applications of multivariate statistical data analysis have been available, coupled with the progress of computers, and these have also been an affect on theoretical considerations.The basic process of data analysis is given as follows: p1). An objective of data analysis is given. p2). The data which seems to be closely connected with the objective is observed. (sampling data) p3). Constructing a model (or a set of models) for explaining the variation of the data. p4). Preprocessing (or transforming) the original data in order to make consistency between input data and the model. p5). Identification of the model based on observed (input) data. p6). Evaluate a goodness of fit. If the goodness of fit is insufficient, then return to P2) or P3), else go to next process. p7). Interpretation of the result and investigate the validity. The most different point on “data mining” and statistical data analysis seems to be the concept of “Data”. In data mining, the data is given as a database in advance. But, in statistical data analysis, the data is observed according to the objective of the analysis.On the other hand, the object of “data mining” is to find the effective (or valuable) information in the data. From the framework of statistical data analysis above, the main processes of data mining are p3), p4) and p5). However, the concept of “efficient information” in data mining is different from the main part of the data variation in statistical data analysis. For instance, in principal component analysis, the main part of the data variation is obtained as the first principal component, which has the largest proportion. But in data mining, the major variation of the data is of no interest, because the knowledge obtained from it is trivial. Then, data mining seems to be interested in the principal components with small proportion in order to get unusual but valuable information. Hence, statistical data analysis for residual data which is removing the main part of the data variation from the original data, will be useful for data mining.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Perspective on Data Mining from Statistical Viewpoints

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Assessing the effect of data pretreatment procedures for principal components analysis of chromatographic data
John W Mcilroy ... Victoria L Mcguffin
Forensic Science International | VOL. 257
John W Mcilroy, et. al.John W Mcilroy ... Victoria L Mcguffin
31 Jul 2015
Forensic Science International | VOL. 257

Clinical data analysis using artificial neural networks (ANN) and principal component analysis (PCA) of patients with breast cancer after mastectomy
Adam Buciński ... Jerzy Załuski
Reports of Practical Oncology & Radiotherapy | VOL. 12
Adam Buciński, et. al.Adam Buciński ... Jerzy Załuski
01 Jan 2007
Reports of Practical Oncology & Radiotherapy | VOL. 12

Supplementary Methods, Supplementary Figures 1-17, Supplementary Tables 1-19 from Complex Rab4-Mediated Regulation of Endosomal Size and EGFR Activation
Aditi Malhotra ... Raymond Abini-Agbomson
-
Aditi Malhotra, et. al.Aditi Malhotra ... Raymond Abini-Agbomson
03 Apr 2023
03 Apr 2023

Supplementary Methods, Supplementary Figures 1-17, Supplementary Tables 1-19 from Complex Rab4-Mediated Regulation of Endosomal Size and EGFR Activation
Kate Tubbesing ... Janine Warren
-
Kate Tubbesing, et. al.Kate Tubbesing ... Janine Warren
03 Apr 2023
03 Apr 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Perspective on Data Mining from Statistical Viewpoints

Abstract

Talk to us

Similar Papers