Abstract

This paper explores advances in the data mining field to solve the fundamental Document Information Retrieval problem. In the proposed approach, useful knowledge is first discovered by using data mining techniques, then swarms use this knowledge to explore the whole space of documents intelligently. We have investigated two data mining techniques in the preprocessing step. The first one aims to split the collection of documents into similar clusters by using the K-means algorithm, while the second one extracts the most closed frequent terms on each cluster already created using the DCI_Closed algorithm. For the solving step, BSO (Bees Swarm Optimization) is used to explore the cluster of documents deeply. The proposed approach has been evaluated on well-known collections such as CACM (Collection of ACM), TREC (Text REtrieval Conference), Webdocs, and Wikilinks, and it has been compared to state-of-the-art data mining, bio-inspired and other documents information retrieval based approaches. The results show that the proposed approach improves the quality of returned documents considerably, with a competitive computational time compared to state-of-the-art approaches.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.