Relevance Feedback using Genetic Algorithm on Information Retrieval for Indonesian Language Documents

Salman Dziyaul Azmi,Retno Kusumaningrum

doi:10.20473/jisebi.5.2.171-182

Salman Dziyaul Azmi, Retno Kusumaningrum

Open Access

https://doi.org/10.20473/jisebi.5.2.171-182

Copy DOI

Abstract

Background: The Rapid growth of technological developments in Indonesia had resulted in a growing amount of information. Therefore, a new information retrieval environment is necessary for finding documents that are in accordance with the user’s information needs.Objective: The purpose of this study is to uncover the differences between using Relevance Feedback (RF) with genetic algorithm and standard information retrieval systems without relevance feedback for the Indonesian language documents.Methods: The standard Information Retrieval (IR) System uses Sastrawi stemmer and Vector Space Model, while Genetic Algorithm-based (GA-based) relevance feedback uses Roulette-wheel selection and crossover recombination. The evaluation metrics are Mean Average Precision (MAP) and average recall based on user judgments.Results: By using two Indonesian language document datasets, namely abstract thesis and news dataset, the results show 15.2% and 28.6% increase in the corresponding MAP values for both datasets as opposed to the standard Information Retrieval System. A respective 7.1% and 10.5% improvement on the recall value at 10th position was also observed for both datasets. The best obtained genetic algorithm parameters for abstract thesis datasets were a population size of 20 with 0.7 crossover probability and 0.2 mutation probability, while for news dataset, the best obtained genetic algorithm parameters were a population size of 10 with 0.5 crossover probability and 0.2 mutation probability.Conclusion: Genetic Algorithm-based relevance feedback increases both values of MAP and average recall at 10th position of retrieved document. Generally, the best genetic algorithm parameters are as follows, mutation probability is 0.2, whereas the size of population size and crossover probability depends on the size of dataset and length of the query.Keywords: Genetic Algorithm, Information Retrieval, Indonesian language document, Mean Average Precision, Relevance Feedback

Highlights

Nowadays, the massive growth of technology has reshaped and transformed the way tasks are carried out in the digital area [1], where the ease of receiving and conveying information had led to a larger pool of information resources and the increased use of databases
By using two Indonesian language document datasets, namely abstract thesis and news dataset, the results show 15.2% and 28.6% increase in the corresponding Mean Average Precision (MAP) values for both datasets as opposed to the standard Information Retrieval System
Since the method had been found to improve the performance of the information retrieval system by providing more relevant search results to the user, this implies that genetic algorithms can be used to support the information retrieval for Indonesian language documents

Summary

Introduction

The massive growth of technology has reshaped and transformed the way tasks are carried out in the digital area [1], where the ease of receiving and conveying information had led to a larger pool of information resources and the increased use of databases. Users would have to conduct a few more search queries in locating required documents as the search results had only matched a small portion of information to their needs [3]. Objective: The purpose of this study is to uncover the differences between using Relevance Feedback (RF) with genetic algorithm and standard information retrieval systems without relevance feedback for the Indonesian language documents. Methods: The standard Information Retrieval (IR) System uses Sastrawi stemmer and Vector Space Model, while Genetic Algorithm-based (GA-based) relevance feedback uses Roulette-wheel selection and crossover recombination. Results: By using two Indonesian language document datasets, namely abstract thesis and news dataset, the results show 15.2% and 28.6% increase in the corresponding MAP values for both datasets as opposed to the standard Information Retrieval System. Conclusion: Genetic Algorithm-based relevance feedback increases both values of MAP and average recall at 10th position of retrieved document. The best genetic algorithm parameters are as follows, mutation probability is 0.2, whereas the size of population size and crossover probability depends on the size of dataset and length of the query

Objectives

Methods

Results

Discussion

Conclusion