Abstract

Mining the implicit knowledge in the electronic documents is a critical task in text analysis and data mining. To attain a knowledge-based view of the electronic documents, the clustering method based upon the topic cannot only be used, but also that based upon the extraction can be done. Therefore, a novel method for the clustering of the electronic documents, summarizing of the full text based on the extracted segments, and an evaluation using multi-measures for the importance to the document were presented. In the method, eighteen kinds of named entities and two kinds of syntactical phrases were extracted, and exploited for the text clustering. Then, a novel similarity equation was proposed for the calculation about the extractions. Meantime, three measures for the importance to the document were proposed, which provided a different view for the document’s content, and recommended a prior checking for the users. Therefore, the method can improve the efficiency of the knowledge discovery, and enhance the management of the document on the large scale of document collection.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.