Abstract

Document Clustering is the process of collecting similar kind of documents into one group based on any particular similarity function. Document clustering is also referred as text clustering. Informative features like phrases and their weights are considered to be more important to perform efficient document clustering. This paper mainly deals on two key parts for achieving efficient document clustering. The first part is a phrase based document model named as the Document Adjacency List, it explains about the construction of a phrase based model of the document set. It produces efficient phrase matching which is useful to decide the similarity among the documents. The second part is the document clustering algorithm that is proposed to enhance the Document Adjacency List for clustering based on the similarity measure. The combination of the above two parts leads to better calculation of similarity among documents and similarity further helps to calculate document clustering.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.