The number of hyperthyroidism patients is increasing these years. As a disease that can lead to cardiovascular disease, it brings great potential health risks to humans. Since hyperthyroidism can induce the occurrence of many diseases, studying its genetic factors will promote the early diagnosis and treatment of hyperthyroidism and its related diseases. Previous studies have used genome-wide association analysis (GWAS) to identify genes related to hyperthyroidism. However, these studies only identify significant sites related to the disease from a statistical point of view and ignore the complex regulation relationship between genes. In addition, mutation is not the only genetic factor of causing hyperthyroidism. Identifying hyperthyroidism-related genes from gene interactions would help researchers discover the disease mechanism. In this paper, we purposed a novel machine learning method for identifying hyperthyroidism-related genes based on gene interaction network. The method, which is called “RW-RVM,” is a combination of Random Walk (RW) and Relevance Vector Machines (RVM). RW was implemented to encode the gene interaction network. The features of genes were the regulation relationship between genes and non-coding RNAs. Finally, multiple RVMs were applied to identify hyperthyroidism-related genes. The result of 10-cross validation shows that the area under the receiver operating characteristic curve (AUC) of our method reached 0.9, and area under the precision-recall curve (AUPR) was 0.87. Seventy-eight novel genes were found to be related to hyperthyroidism. We investigated two genes of these novel genes with existing literature, which proved the accuracy of our result and method.
Read full abstract