Abstract
MotivationUnderstanding the protein structural context and patterning on proteins of genomic variants can help to separate benign from pathogenic variants and reveal molecular consequences. However, mapping genomic coordinates to protein structures is non-trivial, complicated by alternative splicing and transcript evidence.ResultsHere we present VarMap, a web tool for mapping a list of chromosome coordinates to canonical UniProt sequences and associated protein 3D structures, including validation checks, and annotating them with structural information.Availability and implementation https://www.ebi.ac.uk/thornton-srv/databases/VarMap.Supplementary information Supplementary data are available at Bioinformatics online.
Highlights
The consequence of variants affecting protein sequence depends on the structural context and chemical environment
One of the transcripts is identified as the ‘RefSeq Select transcript’, chosen according to criteria described by NCBI
The information provided by VarMap could be obtained manually using the following existing tools and databases: Ensembl (Cunningham et al, 2019), VEP (McLaren et al, 2016), UniProt (UniProt, 2019), SWISS-PROT (Boutet et al, 2007), BioMart (Kinsella et al, 2011), HGNC (Braschi et al, 2019), CATH (Dawson et al, 2017), Pfam (El-Gebali et al, 2019), M-CSA (Ribeiro et al, 2018), FASTA (Pearson, 2014), PDBsum (Laskowski et al, 2018), ScoreCons (Valdar, 2002), gnomAD (Lek et al, 2016) and ClinVar (Landrum et al, 2018)
Summary
The consequence of variants affecting protein sequence depends on the structural context and chemical environment. Understanding these elements has the potential of both uncovering the biochemical consequences of the change, and of identifying ‘hot spots’ where several variants from different individuals occur within close spatial proximity in the same protein. To benefit from the added information 3D protein structures can provide, an accurate mapping between genomic coordinates and the corresponding protein sequence, and structure, is required. Alternative splicing makes mapping genomic coordinates to protein sequence non-trivial. One of the transcripts is identified as the ‘RefSeq Select transcript’, chosen according to criteria described by NCBI (O’Leary et al, 2016), and has a corresponding protein sequence. As the translated select RefSeq and canonical UniProt sequences are independently derived, they often differ [in 18% of cases in the ClinVar database (Landrum et al, 2018) (Fig. 1C)]—resulting in different numbering of the residues
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.