From bibliometrics to text mining: exploring feature selection methods in microarray research

Guilherme Alberto Sousa Ribeiro,Rommel Melgaço Barbosa,Márcio da Cunha Reis,Nattane Luiza Costa

doi:10.1080/03610918.2024.2331083

Abstract

Text mining (TM) is a technique that aims to extract knowledge from unstructured data sources by transforming them into structured data. TM algorithms can be used to detect hidden patterns in large amounts of data, including bibliometric data. Feature selection has been used to reduce the high dimensionality and complexity of computational problems, including microarray data that have a large number of features. In this context, this study aims to use text mining to discover trends in the use of feature selection techniques on microarray data based on bibliometric data such as titles, abstracts, and keywords. A total of 1448 studies published in journals indexed in the Web of Science database were collected to perform a bibliometric and TM analysis. One of the main goals of this study was to determine the patterns related to the roles of medical and machine learning methods. The results demonstrated the trends between microarray and other medical/biological topics, and machine learning techniques such as feature selection and classification, including the identification of commonly used databases and algorithms. Colon, lung, and breast were the most commonly studied cancers identified using microarray data and feature selection techniques. In addition, SVM was frequently used for dimensionality reduction and classification tasks. Despite the insightful results based on text mining, more studies are needed to investigate the performance, strength, and weakness of different types of feature selectors to microarray data.

Full Text