Abstract

Next-generation genome sequencing has enabled the discovery of numerous disease- or drug-response-associated nonsynonymous single nucleotide variants (nsSNVs) that alter the amino acid sequences of a protein. Although several studies have attempted to characterize pathogenic nsSNVs, few have been confirmed as single amino acid variants (SAAVs) at the protein level. Here we developed the SAAVpedia platform to identify, annotate, and retrieve pathogenic SAAV candidates from proteomic and genomic data. The platform consists of four modules: SAAVidentifier, SAAVannotator, SNV/SAAVretriever, and SAAVvisualizer. The SAAVidentifier provides a reference database containing 18 206 090 SAAVs and performs the identification and quality assessment of SAAVs. The SAAVannotator provides functional annotation with biological, clinical, and pharmacological information for the interpretation of condition-specific SAAVs. The SNV/SAAVretriever module enables bidirectional navigation between relevant SAAVs and nsSNVs with diverse genomic and proteomic data. SAAVvisualizer provides various statistical plots based on functional annotations of detected SAAVs. To demonstrate the utility of SAAVpedia, the proteogenomic pipeline with protein-protein interaction network analysis was applied to proteomic data from breast cancer and glioblastoma patients. We identified 1326 and 12 breast-cancer- and glioblastoma-related genes that contained one or more SAAVs, including BRCA2 and FAM49B, respectively. SAAVpedia is a suitable platform for confirming whether a genomic variant is maintained in an amino acid sequence. Furthermore, as a result of the SAAV discovery of these positive controls, the SAAVpedia could play a key role in the protein functional study for the Human Proteome Project (HPP).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call