Abstract

Clustering is one of the most important techniques in machine learning and data mining responsibilities. Similar documents are grouped by performing clustering techniques. Similarity measure is used to determine transaction associations. Hierarchical clustering method produces tree structured results. Partition based clustering model produces the results in grid format. Text documents are formless data values with high dimensional attributes. Document clustering group the unlabeled text documents into meaningful clusters. Traditionally clustering methods need cluster count (K) before the document grouping process. Clustering accuracy decreases drastically with reference to the unsuitable cluster count. Document word features are automatically partitioned into two groups discriminative words and non-discriminative words. But only discriminative words are useful for grouping documents. The contribution of nondiscriminative words confuses the clustering process and leads to poor cluster solutions. The variational inference algorithm is used to infer the document collection structure and partition of document words at the same time. Dirichlet Process Mixture (DPM) model is used to partition documents. DPM clustering model utilizes both the data likelihood and the clustering property of the Dirichlet Process (DP). Dirichlet Process Mixture Model for Feature Partition (DPMFP) is used to discover the latent cluster structure based on the DPM model. DPMFP clustering model is performed without requiring the no. of clusters as input. The Discriminative word identification process is enhanced with the labeled document analysis mechanism. The concept relationships are analyzed with Ontology support. Semantic weight analysis is used for the document similarity measure. This method increases the scalability with the support of labels and concept relations for dimensionality cutback process.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.