Abstract

Discretization is a necessary pre-processing step of the mining task, and a way of performance improvement for many machine learning algorithms. Existing techniques mainly focus on 1-dimension discretization in lower dimensional data space. In this paper, we present an intrinsic dimensional correlation discretization technique in high-dimensional data space. The approach estimates the intrinsic dimensionality (ID) of the data by using maximum likelihood estimation (MLE). Further, we project data onto eigenspace of the estimated lower ID by using principle component analysis (PCA) that can discover the potential correlation structure in the multivariate data. Thus, all the dimensions of the data can be transformed into new independent eigenspace of the ID, and each dimension can be discretized separately in the eigenspace based on the promising Bayes discretization model by using outstanding MODL discretization method. We design a heuristic framework to find better discretization scheme. Our approach demonstrates that there is a significantly improvement on the mean learning accuracy of the classifiers than traditional discretization methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.