Abstract
The patients of Inflammatory bowel disease (IBD) are increasing worldwide. IBD has the characteristics of recurring and difficult to cure, and it is also one of the high-risk factors for colorectal cancer (CRC). The occurrence of IBD is closely related to genetic factors, which prompted us to identify IBD-related genes. Based on the hypothesis that similar diseases are related to similar genes, we purposed a SVM-based method to identify IBD-related genes by disease similarities and gene interactions. One hundred thirty-five diseases which have similarities with IBD and their related genes were obtained. These genes are considered as the candidates of IBD-related genes. We extracted features of each gene and implemented SVM to identify the probability that it is related to IBD. Ten-cross validation was applied to verify the effectiveness of our method. The AUC is 0.93 and AUPR is 0.97, which are the best among four methods. We prioritized the candidate genes and did case studies on top five genes.
Highlights
Inflammatory bowel disease (IBD) (Graham and Xavier, 2020) is a worldwide high incidence of intestinal inflammation, which is divided into Crohn’s disease (CD) (Roda et al, 2020) and ulcerative colitis (UC) (Danese et al, 2020)
We used 9 of 10 groups to build the Support Vector Machine (SVM) model and the rest 1 group was used as the testing set
We compared our method with several traditional methods, such as Artificial Neural Network (ANN) (Plumb et al, 2005), Random Forest (RF) (Archer and Kimes, 2008), Naïve Bayes (NB) (Archer and Kimes, 2008)
Summary
Inflammatory bowel disease (IBD) (Graham and Xavier, 2020) is a worldwide high incidence of intestinal inflammation, which is divided into Crohn’s disease (CD) (Roda et al, 2020) and ulcerative colitis (UC) (Danese et al, 2020). In view of the important role of gene mutation in the development of IBD, the genome wide association studies (GWAS) have been applied to the risk prediction and mechanism of IBD (Franke et al, 2008; Zhang et al, 2021). Many research have combined expression quantitative trait loci (eQTL) with GWAS to explore the biological functions, this method cannot perform large-scale disease-related gene prediction (Zhao et al, 2019, 2020c). These 88 diseases have 15,271 entries, which means 15,271 genes are known related to these 88 diseases. The features of first blue gene could be represented as the shortest paths to disease 2 and 3 The edge of this network is the interaction strength.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.