Abstract

Text classification is the process of automatically sorting a set of documents into categories from a predefined set. Feature clustering is a powerful method to reduce the dimensionality of feature vectors for text classification. After pre-processing, the document can be clustered in the schema level based on the occurrence of the words relatively. Clustering process group the words based on the pattern. In proposing a feature clustering mechanism finds the pattern match with the number of relevant data present in the database.

Highlights

  • An automatic text classification can efficiently analyze the set of documents and organize the data based on certain categories[2]

  • Even though the learning ability as well as computational ­complexity of training in support vector machines may be independent of the dimension of document feature space, minimizing complexity is an essential issue to efficiently handle a large number of terms in practical applications of text classification adopts novel dimension reduction methods to reduce the dimension of the document ­vectors dramatically

  • Many approaches to retrieve XML data have been proposed in recent years, this approaches mainly focusing on XML document clustering based on structure

Read more

Summary

Introduction

An automatic text classification can efficiently analyze the set of documents and organize the data based on certain categories[2]. Document clustering is defined as the automatic division of the data or grouping a set of objects in a way such that objects in the same group are more similar to each other than to those in another group[1]. Clustering involves organizing XML documents depends on their similarity without knowing the structural representation of XML documents[4]. Analyzing the data from a particular database or data repository will result to the set of data similar to the searched string is derived from the cluster. A Cluster is the form of data that which is collected at a particular point from many data sets or documents which can be retrieved from the process. L 1⁄4 fw[1]; w2; g is the set of clusters, C fc[1]; c2; g is the set of classes (for supervised evaluation), and N is the number of objects[6]

Related Work
An Overview
Preprocessing of XML Documents
Clustering of XML Documents
Experiments
Clustered Data
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.