Abstract
Discretization is one of the data preprocessing topics in the field of data mining, and is a critical issue to improve the efficiency and quality of data mining. Multi-scale can reveal the structure and hierarchical characteristics of data objects, the representation of the data in different granularities will be obtained if we make a reasonable hierarchical division for a research object. The multi-scale theory is introduced into the process of data discretization and a data discretization method based on multi-scale and information entropy called MSE is proposed. MSE first conducts scale partition on the domain attribute to obtain candidate cut point set with different granularity. Then, the information entropy is applied to the candidate cut point set, and the candidate cut point with the minimum information entropy is selected and detected in turn to determine the final cut point set using the MDLPC criterion. In such way, MSE avoids the problem that the candidate cut points are limited to only certain limited attribute values caused by considering only the statistical attribute values in the traditional discretization methods, and reduces the number of candidates by controlling the data division hierarchy to an optimal range. Finally, the extensive experiments show that MSE achieves high performance in terms of discretization efficiency and classification accuracy, especially when it is applied to support vector machines, random forest, and decision trees.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.