Abstract

Objective: The main objective is to implement multi view point similarity to perform document comparisons that use the concept of clustering. Methods/Analysis: The main task of data mining is clustering which is used to group or select objects which are similar to one another. Data mining divides whole document into meaningful clusters and analyses data. There are many different types of clustering methods like hierarchical clustering, partitioned clustering and data grouping may be based on distance, viewpoints, Euclidean distance etc, Of these, the current system uses single view point similarity. This type of single view point similarity has some disadvantages. The main disadvantage is it does not use full set of document data so that detailed comparison measures cannot be revealed. In the future system multi viewpoint similarity is used to overcome the above disadvantage. Findings: The multi view point similarity method is used to overcome the disadvantages mentioned under the analysis. This method compares similarity between the multiple documents in detailed manner. The documents have been compared line by line and show the similarity. Then we have enhanced the existing ECSMTP algorithm and it is named as ECSMTP (Enhanced Concept Based Similarity Measure for Text Processing). This algorithm categorizes data from selected documents along with weight age of document, and based on that it forms clusters and calculates the similarity measure. Further in this system different kind of documents were compared like text documents, word, PDF documents etc., but it is not in the existing system. User may select kind of document and comparisons can be made on the selected documents. Clusters were formed and these clusters were compared.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.