Intelligent information retrieval system using automatic thesaurus construction

Wei Song,Jucheng Yang,Chenghua Li,Sooncheol Park

doi:10.1080/03081079.2010.530026

Abstract

This paper presents an intelligent information retrieval (IR) system based on automatic thesaurus construction for its applications of document clustering and classification. These two applications are the most influential and widely used fields amongst the IR research community. We apply two biologically inspired algorithms, i.e. genetic algorithm (GA) and neural network (NN), to these two fields. A fuzzy logic controller GA and an adaptive back-propagation NN are proposed in our study, which can validly overcome the problems existing in their archetypes, e.g. slow evolution and being prone to trap into a local optimum. Furthermore, a well-constructed thesaurus has been recognised as a valuable tool in the effective operation of clustering and classification. It solves the problem in document representation organised by a bag of words, where some important relationships between words, e.g. synonymy and polysemy, are ignored. To investigate how our IR system could be used effectively, we conduct experiments on four data sets from the benchmark Reuter-21578 document collection and 20-newsgroup corpus. The results reveal that our IR system enhances the performance in comparison with k-means, common GA, and conventional back-propagation NN.

Full Text