Abstract

BackgroundGenomic variations are associated with the metabolism and the occurrence of adverse reactions of many therapeutic agents. The polymorphisms on over 2000 locations of cytochrome P450 enzymes (CYP) due to many factors such as ethnicity, mutations, and inheritance attribute to the diversity of response and side effects of various drugs. The associations of the single nucleotide polymorphisms (SNPs), the internal pharmacokinetic patterns and the vulnerability of specific adverse reactions become one of the research interests of pharmacogenomics. The conventional genomewide association studies (GWAS) mainly focuses on the relation of single or multiple SNPs to a specific risk factors which are a one-to-many relation. However, there are no robust methods to establish a many-to-many network which can combine the direct and indirect associations between multiple SNPs and a serial of events (e.g. adverse reactions, metabolic patterns, prognostic factors etc.). In this paper, we present a novel deep learning model based on generative stochastic networks and hidden Markov chain to classify the observed samples with SNPs on five loci of two genes (CYP2D6 and CYP1A2) respectively to the vulnerable population of 14 types of adverse reactions.MethodsA supervised deep learning model is proposed in this study. The revised generative stochastic networks (GSN) model with transited by the hidden Markov chain is used. The data of the training set are collected from clinical observation. The training set is composed of 83 observations of blood samples with the genotypes respectively on CYP2D6*2, *10, *14 and CYP1A2*1C, *1 F. The samples are genotyped by the polymerase chain reaction (PCR) method. A hidden Markov chain is used as the transition operator to simulate the probabilistic distribution. The model can perform learning at lower cost compared to the conventional maximal likelihood method because the transition distribution is conditional on the previous state of the hidden Markov chain. A least square loss (LASSO) algorithm and a k-Nearest Neighbors (kNN) algorithm are used as the baselines for comparison and to evaluate the performance of our proposed deep learning model.ResultsThere are 53 adverse reactions reported during the observation. They are assigned to 14 categories. In the comparison of classification accuracy, the deep learning model shows superiority over the LASSO and kNN model with a rate over 80 %. In the comparison of reliability, the deep learning model shows the best stability among the three models.ConclusionsMachine learning provides a new method to explore the complex associations among genomic variations and multiple events in pharmacogenomics studies. The new deep learning algorithm is capable of classifying various SNPs to the corresponding adverse reactions. We expect that as more genomic variations are added as features and more observations are made, the deep learning model can improve its performance and can act as a black-box but reliable verifier for other GWAS studies.

Highlights

  • Genomic variations are associated with the metabolism and the occurrence of adverse reactions of many therapeutic agents

  • The first one is a baseline method based on a least square loss (LASSO) algorithm to establish the connection between the single nucleotide polymorphisms (SNPs) and adverse drug reactions (ADRs) [12]

  • The LASSO algorithm is labeled as M, the k-Nearest Neighbors (k-NN) algorithm is labeled as M2, and the proposed generative stochastic networks (GSN) generative algorithm is labeled as M3

Read more

Summary

Introduction

Genomic variations are associated with the metabolism and the occurrence of adverse reactions of many therapeutic agents. Genomewide association study (GWAS) is to explore the correlations among genomic variations and a series of genetic risk factors It aims to reveal the complexity of the changes of DNA sequence and their corresponding effects on gene expression, proteins and leading to the macro factors such as disease susceptibility, prognostic factor and pattern of metabolism etc [1]. The false positive error will not be discovered because it belongs to a high LD (Linkage Disequilibrium) These errors will accumulate as the associative network of SNPs and they will eventually generate error information that causes various problems. An example of this cost can be found in pharmacogenomics studies where the false linkage will cause either false prediction of risks or potential dangers of drug adverse reactions after the products are on the market

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.