Information Retrieval Model Research Articles

The present paper describes a Bayesian network approach to Information Retrieval (IR) from Web documents. The network structure provides an intuitive representation of uncertainty relationships and the embedded conditional probability table is used by inference algorithms in an attempt to identify documents that are relevant to the user's needs, expressed in the form of Boolean queries. Our research has been directed in constructing a probabilistic IR framework that focus on assisting users to perform Ad-hoc retrieval of documents from the various domains such as economics, news, sports, etc. Furthermore, users can integrate feedback regarding the relevance of the retrieved documents in an attempt to improve performance on upcoming requests. Towards these goals, we have expanded the traditional Bayesian network IR system and tested it on several Greek web corpora on different application domains. We have developed two different approaches with regards to the structure: a simple one, where the structure is manually provided, and an automated one, where data mining is used in order to extract the network's structure. Results have depicted competitive performance against successful IR models of different theoretical backgrounds, such as the vector space utilizing tf-idf and the probabilistic model of BM25 in terms of precision-recall curves. In order to further improve the performance of the IR system, we have implemented a novel similarity-based lemmatization framework, reducing thus the ambiguity posed by the plethora of morphological variations of the languages in question. The employed lemmatization framework comprises of 3 core components (i.e. the word segregation, the data cleansing and the lemmatization modules) and is language-independent (i.e. can be applied to other languages with morphological peculiarities and thus improve Ad-hoc retrieval) since it achieves the mapping of an input word to its normalized form by employing two state-of-the-art language independent distance metric models, meaning the Levenshtein Edit distance and the Dice coefficient similarity measure, combined with a language model describing the most frequent inflectional suffixes of the examined language. Experimental results support our claim on the significance of this incorporation to Greek texts web retrieval as results improve by a factor of 4% to 11%.

Read full abstract

BackgroundThe growth of the biomedical information requires most information retrieval systems to provide short and specific answers in response to complex user queries. Semantic information in the form of free text that is structured in a way makes it straightforward for humans to read but more difficult for computers to interpret automatically and search efficiently. One of the reasons is that most traditional information retrieval models assume terms are conditionally independent given a document/passage. Therefore, we are motivated to consider term associations within different contexts to help the models understand semantic information and use it for improving biomedical information retrieval performance.ResultsWe propose a term association approach to discover term associations among the keywords from a query. The experiments are conducted on the TREC 2004-2007 Genomics data sets and the TREC 2004 HARD data set. The proposed approach is promising and achieves superiority over the baselines and the GSP results. The parameter settings and different indices are investigated that the sentence-based index produces the best results in terms of the document-level, the word-based index for the best results in terms of the passage-level and the paragraph-based index for the best results in terms of the passage2-level. Furthermore, the best term association results always come from the best baseline. The tuning number k in the proposed recursive re-ranking algorithm is discussed and locally optimized to be 10.ConclusionsFirst, modelling term association for improving biomedical information retrieval using factor analysis, is one of the major contributions in our work. Second, the experiments confirm that term association considering co-occurrence and dependency among the keywords can produce better results than the baselines treating the keywords independently. Third, the baselines are re-ranked according to the importance and reliance of latent factors behind term associations. These latent factors are decided by the proposed model and their term appearances in the first round retrieved passages.

Read full abstract

Information Retrieval Model Research Articles

Related Topics

Articles published on Information Retrieval Model

BAYESIAN RETRIEVAL USING A SIMILARITY-BASED LEMMATIZER

Ontology-Oriented Inference-Based Learning Content Management System

Effects of Terms Recognition Mistakes on Requests Processing for Interactive Information Retrieval

Model of Recommendation System for for Indexing and Retrieving the Learning Object based on Multiagent System

Model of Recommendation System for for Indexing and Retrieving the Learning Object based on Multiagent System

Modeling and mining term association for improving biomedical information retrieval performance

A novel neighborhood based document smoothing model for information retrieval

Cross-language information retrieval models based on latent topic models trained with document-aligned comparable corpora

An Ontology-Based Information Retrieval Model for Vegetables E-Commerce

Aggregated Search in XML Documents

Relating ontologies with a fuzzy information model

A new term‐weighting scheme for naïve Bayes text categorization

Analysis of a Vector Space Model, Latent Semantic Indexing and Formal Concept Analysis for Information Retrieval

Design of Personalized Intelligent Information Retrieval Model Based on Agent

Personalized Web Search Using Clickthrough Data and Web Page Rating

Semantically enhanced Uyghur Information Retrieval Model

Applying human computation mechanisms to information retrieval

Extracting Collocations from Bengali Text Corpus

An I/O Bandwidth-Sensitive Sparse Matrix-Vector Multiplication Engine on FPGAs

Research and Design on E-government Information Retrieval Model

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Information Retrieval Model Research Articles

Related Topics

Articles published on Information Retrieval Model

BAYESIAN RETRIEVAL USING A SIMILARITY-BASED LEMMATIZER

Ontology-Oriented Inference-Based Learning Content Management System

Effects of Terms Recognition Mistakes on Requests Processing for Interactive Information Retrieval

Model of Recommendation System for for Indexing and Retrieving the Learning Object based on Multiagent System

Model of Recommendation System for for Indexing and Retrieving the Learning Object based on Multiagent System

Modeling and mining term association for improving biomedical information retrieval performance

A novel neighborhood based document smoothing model for information retrieval

Cross-language information retrieval models based on latent topic models trained with document-aligned comparable corpora

An Ontology-Based Information Retrieval Model for Vegetables E-Commerce

Aggregated Search in XML Documents

Relating ontologies with a fuzzy information model

A new term‐weighting scheme for naïve Bayes text categorization

Analysis of a Vector Space Model, Latent Semantic Indexing and Formal Concept Analysis for Information Retrieval

Design of Personalized Intelligent Information Retrieval Model Based on Agent

Personalized Web Search Using Clickthrough Data and Web Page Rating

Semantically enhanced Uyghur Information Retrieval Model

Applying human computation mechanisms to information retrieval

Extracting Collocations from Bengali Text Corpus

An I/O Bandwidth-Sensitive Sparse Matrix-Vector Multiplication Engine on FPGAs

Research and Design on E-government Information Retrieval Model