Selecting Relevant Genes From Microarray Datasets Using a Random Forest Model

Hui Xia,Metin Akay,Yasemin M Akay

doi:10.1109/access.2021.3092368

Abstract

Recent studies have demonstrated microarray expression data can be used to identify gene regulatory pathways. However, one of the major challenges is to utilize the large microarray data (genes and micro-RNAs) to have an efficient computational model. Therefore, there is an urgent need to reduce the dimensionality of these large sets using machine learning methods without compromising the accuracy. This requires an appropriate machine learning algorithm to select the significant features from these large datasets. Therefore, in this study, we use a supervised method based on a Random Forest to identify significant features from three microarray datasets from prenatal nicotine, alcohol, and nicotine and alcohol exposure groups in two different cell types (dopamine and non-dopamine neurons). Our approach was computationally efficient to reduce the dimensionality of extremely large microarray datasets. Furthermore, our results indicated that using only the top 20% of features was sufficient to confirm the genetic pathways previously identified when using all of the features in the model.

Highlights

Microarrays enable the global screening of gene expression profiles by quantifying the changes in the regulation of thousands of genes [1]
ANIMAL EXPERIMENTS The microarray data was collected from dopaminergic and non-dopaminergic neurons obtained from the rat ventral tegmental area (VTA)
All experiments were performed in accordance with the protocols approved by the Institutional Animal Care and Use Committee (IACUC) and the University of Houston Animal Care Operations (ACO)

Summary

Introduction

Microarrays enable the global screening of gene expression profiles by quantifying the changes in the regulation of thousands of genes [1]. Microarrays have been adopted to identify the gene regulation pathways [2] using supervised or unsupervised machine learning methods. The large number of features limits the model reliability and in many cases, may cause overfitting [3]. To improve the efficiency of the gene regulatory network modelling, the dimensionality of the features including messenger RNAs (mRNA, genes) and microRNAs (miRNAs) needs to be reduced [4]. There are two different approaches including unsupervised and supervised methods to reduce the dimensionality of complex datasets. In unsupervised learning, having a large size data and features negatively affects the computational performance of the underlying learning algorithm. The Hill Climb (HC) unsupervised learning algorithm for dimensionally reduction has been widely used in practice to improve its computational efficiency[5]

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 4	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Selecting Relevant Genes From Microarray Datasets Using a Random Forest Model

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Assessment of the cerebellar neurotoxic effects of nicotine in prenatal alcohol exposure in rats
Dwipayan Bhattacharya ... Muralikrishnan Dhanasekaran
Life Sciences | VOL. 194
Dwipayan Bhattacharya, et. al.Dwipayan Bhattacharya ... Muralikrishnan Dhanasekaran
07 Dec 2017
Life Sciences | VOL. 194

Systematic review showed that low and moderate prenatal alcohol and nicotine exposure affected early child development.
Pia Römer ... Franz Petermann
Acta Paediatrica | VOL. 109
Pia Römer, et. al.Pia Römer ... Franz Petermann
04 Sep 2020
Acta Paediatrica | VOL. 109

Permanent, Sex-Selective Effects of Prenatal or Adolescent Nicotine Exposure, Separately or Sequentially, in Rat Brain Regions: Indices of Cholinergic and Serotonergic Synaptic Function, Cell Signaling, and Neural Cell Number and Size at 6 Months of Age
Theodore A Slotkin ... Ian T Ryde
Neuropsychopharmacology | VOL. 32
Theodore A Slotkin, et. al.Theodore A Slotkin ... Ian T Ryde
18 Oct 2006
Neuropsychopharmacology | VOL. 32

Cholinergic receptors in heart and brainstem of rats exposed to nicotine during development: implications for hypoxia tolerance and perinatal mortality
Theodore A Slotkin ... Frederic J Seidler
Developmental Brain Research | VOL. 113
Theodore A Slotkin, et. al.Theodore A Slotkin ... Frederic J Seidler
01 Mar 1999
Developmental Brain Research | VOL. 113

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Selecting Relevant Genes From Microarray Datasets Using a Random Forest Model

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access