Integrated Clustering and Feature Selection Scheme for Text Documents.

Thangamani Thangamani

doi:10.3844/jcssp.2010.536.541

Thangamani Thangamani

Open Access

PDF Available

https://doi.org/10.3844/jcssp.2010.536.541

Copy DOI

Export

Save

Cite

Journal: Journal of Computer Science	Publication Date: May 1, 2010
Citations: 22	License type: cc-by

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Problem statement: Text documents are the unstructured databases that contain raw data collection. The clustering techniques are used grou p up the text documents with reference to its similarity. Approach: The feature selection techniques were used to impr ove the efficiency and accuracy of clustering process. The feature selecti on was done by eliminate the redundant and irrelevant items from the text document contents. S tatistical methods were used in the text clustering and feature selection algorithm. The cube size is v ery high and accuracy is low in the term based text clustering and feature selection method. The semant ic clustering and feature selection method was proposed to improve the clustering and feature sele ction mechanism with semantic relations of the text documents. The proposed system was designed to identify the semantic relations using the ontology. The ontology was used to represent the term and con cept relationship. Results: The synonym, meronym and hypernym relationships were represented in the ontology. The concept weights were estimated with reference to the ontology. The conce pt weight was used for the clustering process. The system was implemented in two methods. They were term clustering with feature selection and semantic clustering with feature selection. Conclusion: The performance analysis was carried out with the term clustering and semantic clustering methods . The accuracy and efficiency factors were analyzed in the performance analysis.

Full Text