Information Retrieval Applications Research Articles

INTRODUCTION: Gene expression data analysis is a critical aspect of disease prediction and classification, playing a pivotal role in the field of bioinformatics and biomedical research. High-dimensional gene expression datasets hold a wealth of information, but their effective utilization is hindered by the presence of irrelevant dimensions and noise. The challenge lies in extracting meaningful features from these datasets to enhance the accuracy of disease prediction and classification while maintaining computational efficiency. Feature selection is a crucial step in addressing these challenges, as it aims to identify and retain only the most informative characteristics from large high-dimensional microarray datasets. In the context of microarray gene expression data, characterized by its substantial dimensionality, selecting relevant features is essential for efficient nearest neighbor search, a fundamental component of various analytical tasks in bioinformatics and data mining. Existing feature selection methods in high-dimensional data often face issues related to the trade-off between search accuracy and computational efficiency. This paper introduces a novel approach, the Nearest Neighbor Feature Selection with Symmetrical Uncertainty-based Redundancy Removal (NNFSRR) method, designed to enhance the classification of microarray gene expression data through feature selection. The NNFSRR method focuses on reducing the dimensionality of the dataset by identifying and removing redundant features, allowing subsequent searches to operate solely on relevant dimensions. OBJECTIVES: The primary goal is to evaluate the NNFSRR method's effectiveness in improving nearest neighbor search in microarray gene expression datasets by reducing dimensionality. This method utilizes Symmetrical Uncertainty-based correlation between dimensions for feature selection and aims to enhance accuracy and efficiency compared to existing methods. METHODS: The NNFSRR method uses Symmetrical Uncertainty to identify and remove redundant features from microarray gene expression datasets. Reduced datasets are used for nearest neighbor search, improving accuracy and efficiency. Experiments are conducted using real-world datasets, and comparisons with existing methods are made based on search time and accuracy. RESULTS: The NNFSRR method demonstrates improved nearest neighbor search performance, outperforming basic brute force methods and existing feature selection techniques. Selected feature sets exhibit strong class associations while minimizing feature correlations, enhancing classification precision. CONCLUSION: In conclusion, the NNFSRR method presents a promising approach to address the challenges posed by high-dimensional gene expression data. It effectively reduces dimensionality, improves search accuracy, and enhances the efficiency of nearest neighbor search. Our experimental results demonstrate that this method outperforms existing techniques in terms of search time and accuracy, making it a valuable tool for applications in bioinformatics, data mining, pattern recognition, and biological information retrieval. The NNFSRR method holds the potential to advance our understanding of complex biological processes and support more accurate disease prediction and classification.

Read full abstract

Similar text search aims to find texts relevant to a given query from a database, which is fundamental in many information retrieval applications, such as question search and exercise search. Since millions of texts always exist behind practical search engine systems, a well-developed text search system usually consists of recall and ranking stages. Specifically, the recall stage serves as the basis in the system, where the main purpose is to find a small set of relevant candidates accurately and efficiently. Towards this goal, deep semantic hashing, which projects original texts into compact hash codes, can support good search performance. However, learning desired textual hash codes is extremely difficult due to the following problems. First, compact hash codes (with short length) can improve retrieval efficiency, but the demand for learning compact hash codes cannot guarantee accuracy due to severe information loss. Second, existing methods always learn the unevenly distributed codes in the space from a local perspective, leading to unsatisfactory code-balance results. Third, a large fraction of textual data contains various types of noise in real-world applications, which causes the deviation of semantics in hash codes. To this end, in this paper, we first propose a general unsupervised encoder-decoder semantic hashing framework, namely MASH (short for Memory-bAsed Semantic Hashing), to learn the balanced and compact hash codes for similar text search. Specifically, with a target of retaining semantic information as much as possible, the encoder introduces a novel relevance constraint among informative high-dimensional representations to guide the compact hash code learning. Then, we design an external memory where the hashing learning can be optimized in the global space to ensure the code balance of the learning results, which can promote search efficiency. Besides, to alleviate the performance degradation problem of the model caused by text noise, we propose an improved SMASH (short for denoiSing Memory-bAsed Semantic Hashing) model by incorporating a noise-aware encoder-decoder framework. This framework considers the noise degree for each text from the semantic deviation aspect, ensuring the robustness of hash codes. Finally, we conduct extensive experiments in three real-world datasets. The experimental results clearly demonstrate the effectiveness and efficiency of MASH and SMASH in generating balanced and compact hash codes, as well as the superior denoising ability of SMASH.

Read full abstract

Information Retrieval Applications Research Articles

Related Topics

Articles published on Information Retrieval Applications

Dual Contrastive Learning for Cross-Domain Named Entity Recognition

A novel Deep High-level Concept-mining Jointing Hashing Model for unsupervised cross-modal retrieval

Generating grating in cavity magnomechanics

Comparative Evaluation of Pre-Trained Language Models for Biomedical Information Retrieval.

A Comprehensive Survey on Relation Extraction: Recent Advances and New Frontiers

Evaluating LLMs on document-based QA: Exact answer selection and numerical extraction using CogTale dataset

The digital lab manager: Automating research support

Text Summarization Based on Semantic Similarity

Neural Bookmarks: Information Retrieval with Deep Learning and EEG Data

FFEC: Fast and forward-secure equivalence-based ciphertext comparability for multiple users in cloud environment

Blockchain Technology Applications in Search and Information Retrieval in the Digital Environment for Libraries: An Analytical Study

An Optimal Algorithm for Finding Champions in Tournament Graphs

NNFSRR: Nearest Neighbor Feature Selection and Redundancy Removal Method for Nearest Neighbor Search in Microarray Gene Expression Data

Discovery and recognition of formula concepts using machine learning

Librarians’ ICT skills and service delivery in private universities in Nigeria

Extraction and normalization of IR indexing terms and phrases in a highly inflectional language

Semantic Web technologies and bias in artificial intelligence: A systematic literature review

An Efficient and Robust Semantic Hashing Framework for Similar Text Search

Robustness of rank aggregation methods for malicious disturbance

Word Sense Disambiguation Based Sentiment Classification Using Linear Kernel Learning Scheme

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Information Retrieval Applications Research Articles

Related Topics

Articles published on Information Retrieval Applications

Dual Contrastive Learning for Cross-Domain Named Entity Recognition

A novel Deep High-level Concept-mining Jointing Hashing Model for unsupervised cross-modal retrieval

Generating grating in cavity magnomechanics

Comparative Evaluation of Pre-Trained Language Models for Biomedical Information Retrieval.

A Comprehensive Survey on Relation Extraction: Recent Advances and New Frontiers

Evaluating LLMs on document-based QA: Exact answer selection and numerical extraction using CogTale dataset

The digital lab manager: Automating research support

Text Summarization Based on Semantic Similarity

Neural Bookmarks: Information Retrieval with Deep Learning and EEG Data

FFEC: Fast and forward-secure equivalence-based ciphertext comparability for multiple users in cloud environment

Blockchain Technology Applications in Search and Information Retrieval in the Digital Environment for Libraries: An Analytical Study

An Optimal Algorithm for Finding Champions in Tournament Graphs

NNFSRR: Nearest Neighbor Feature Selection and Redundancy Removal Method for Nearest Neighbor Search in Microarray Gene Expression Data

Discovery and recognition of formula concepts using machine learning

Librarians’ ICT skills and service delivery in private universities in Nigeria

Extraction and normalization of IR indexing terms and phrases in a highly inflectional language

Semantic Web technologies and bias in artificial intelligence: A systematic literature review

An Efficient and Robust Semantic Hashing Framework for Similar Text Search

Robustness of rank aggregation methods for malicious disturbance

Word Sense Disambiguation Based Sentiment Classification Using Linear Kernel Learning Scheme