Abstract It is now widely believed that low-frequency variants may play an important role in cancers. In addition, low-frequency variants are usually closely related to populations. To identify the disease-associated variants, several projects are now underway to build a reference of genetic variations from different populations. Furthermore, massive tools and databases are available to compare the dynamic gene expression among different organs and predict functional effects of genetic variants. These resources aid to identify genetic roots of cancers in populations easier. Despite the existence of population allele frequency information, gene expression databases and functional effects of protein mutations, a comprehensive platform providing an integrated annotation database in human genetic variants, is still lacking. In the study, we proposed a web-based database to determine potential variations in cancers with collected variant information from current common databases and integrated those data to provide comprehensive analyses. The web-site offers a function to upload a variant call format (VCF) files for variants annotation. Importantly, we integrated population allele frequency information from NHBI GO Exome Sequencing Project (ESP), 1000 Genomes Project and Tohoku Medical Megabank Project to help users figure out the correlation between disease and population. Additionally, we also collected gene expression profiles from The Human Protein Atlas, Expression Atlas and NCBI SRA in different organs of human, mouse, and zebrafish respectively to reflect the relationship between gene expression and genetic variations in a specific organ. Overall, the database aids to predict protein functions in mutations, analyze population allele frequency and gene expression information from provided variants of diseases. In the result, we use three EGFR mutations to display the proposed system. A recent study has reported that Asian patients with non-small cell lung cancer (NSCLC) carrying a higher rate of EGFR mutations than non-Asian patients. These mutations, such as chr7:55241708G>C (G719A), chr7:55249005G>T (S768I) and chr7:55259515T>G (L858R), are found approximately in 30% of Asian (Japanese) patients. In functional prediction results, these sites are exonic and nonsynonymous mutations. The REVEL scores of three EGFR mutations are 0.824, 0.765 and 0.961, and gerp++ scores are 5.5, 5.85 and 5.71, respectively. Both REVEL and gerp++ show high scores for severe damage of protein structure in the mutations. According to the results, researchers can infer that the three mutations are pathogenic variants implying the nucleotide positions with a higher constraint. Citation Format: Li Mei Chiang, Chien Yueh Lee, Liang Chuan Lai, Mong Hsun Tsai, Tzu Pin Lu, Eric Y. Chuang. VariED: an integrated database of variants and gene expression profiles for genetic diseases [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2017; 2017 Apr 1-5; Washington, DC. Philadelphia (PA): AACR; Cancer Res 2017;77(13 Suppl):Abstract nr 3573. doi:10.1158/1538-7445.AM2017-3573
Read full abstract