Abstract
Real-world database applications possess massive data collections with different data formats such as continuous, discrete or nominal. Continuous data makes the analysis process more complex as the data can take any value within a particular range and so granule mining has been used recently with techniques such as neighbourhood rough sets to discover granules in continuous data. This approach is yet to address the granule resolution design concepts, so this paper presents a novel method, Hierarchical Clustering-based Granulation (HCluG) to improve the granule identification of continuous data by combining hierarchical clustering with neighborhood rough sets, reducing user involvement in granule resolution parameters tuning and introducing an automated granule discovery method. HCluG comprises a feature selection method to evaluate the quality of the granules generated with the proposed granule approximations. Experimental results show HCluG reduces the number of selected features while improving the classification performance. HCluG outperforms the rough sets-based feature selection baselines when used with K-Nearest Neighbours and Radial Basis Function Support Vector Machine on average and performs better on average than using the complete feature set. This method can be used in data analysis to achieve high classification performance with a fewer number of features and less user involvement.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.