Abstract
This study uses publicly available gene-expression from peripheral blood mononuclear cells fed into a logistically trained machine learn model to accurately predict the probability of early onset of Multiple Sclerosis by identifying biomarkers in genetic expression and establishes logistic regression as a viable methodology for genetic analysis to predict disease. Current detection methodology of neurological diseases such as MRI scans of existing lesions are impractical solutions when it comes to alleviating most of a patient’s symptoms, as they rely on the disease to have already developed to detect it. Machine learning is a rapidly emerging tool that has much potential in not only disease detection, but early onset diagnosis as well. This study utilized the NEO Gene Expression Omnibus data repository to selectively identify key PBMC gene expression datasets to feed into a logistically trained model. Data filtration by Log-Fold Change analysis and p-Value importance allowed for data simplification to reduce model dimensionality, improve model accuracy, and even identify important gene markers in Multiple Sclerosis. Nearly 33,000 genes were eliminated through extensive data filtration, and 15 genes were marked as statistically significant in the development of Multiple Sclerosis. Model accuracy produced was nearly 100%, though lack of representative data highlights the need for further testing. The methodology in this experiment from the data accumulation to the actual construction and testing of the model itself serves as strong representation of the value artificial intelligence can have in the field of genomic analysis in disease detection.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have