Abstract

An algorithm can be applied on numerical or continuous attributes as well as on nominal or discrete value. If input to an algorithm required only attributes of nominal or discrete type then continuous attributes of the dataset need to be discretize before applying such algorithm. Discretization method can be of two types namely supervised and unsupervised. Supervised methods of dicretization utilize class labels of the dataset while in unsupervised method class labels are totally disregarded. In many literatures it has been shown that supervised methods gives good discretization result. Supervised algorithms cannot apply if dataset is unlabeled. In real life, many dataset do not have class (label) attribute and only unsupervised discretization methods are applicable in such cases. This paper presents discretization schemes for unlabeled data based on RST (Rough Set Theory) and clustering. The experiments have been performed to compare the proposed technique with other discretization methods for labeled data on two benchmark datasets. Two parameters Class-Attribute Interdependence Redundancy and the total number of intervals have been used to compare the proposed techniques with other existing techniques. The results display a satisfactory tradeoff between the information loss and number of intervals for the proposed method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.