Predicting Autism Spectrum Disorder Using Pluripotent Stem Cell RNA-Seq Data and Machine Learning

Richard Li

doi:10.56397/jimr/2023.09.07

Abstract

In this work, datasets of gene expression in Autism Spectrum Disorder (ASD) were analyzed with the goal of selecting the most attributed genes and performing classification with machine learning algorithms. The publicly published datasets (GSE129806 and GSE214323) from the Gene Expression Omnibus database, which are both RNA-seq gene count data of humans, were downloaded. Then the workflows with differential expression analysis, principal component analysis (PCA), gene set enrichment analysis (GSEA) (Subramanian et al., 2005) and gene expression Meta-Analysis (Toro-Domínguez et al., 2020) were developed. The datasets were following pipelines which used machine learning algorithms to develop prediction models for classification. The results of this exploratory study suggest that the gene expression profiles identified from the pluripotent stem cell samples with ASD can be used to identify a biological signature for ASD with machine learning techniques. And especially, the gene expression Meta-Analysis of multiple datasets and larger numbers of samples could lead to more practical tools, such as Machine Learning models and workflows, to detect ASD at an early age in the general population.

Full Text