Abstract
Annotating the impact of a variant on a gene is a vital component of genetic medicine and genetic research. Different gene annotations for the same genomic variant are possible, because different structures and sequences for the same gene are available. The clinical community typically use RefSeq NMs to annotate gene variation, which do not always match the reference genome. The scientific community typically use Ensembl ENSTs to annotate gene variation. These match the reference genome, but often do not match the equivalent NM. Often the transcripts used to annotate gene variation are not provided, impeding interoperability and consistency. Here we introduce the concept of the Clinical Annotation Reference Template (CART). CARTs are analogous to the reference genome; they provide a universal standard template so reference genomic coordinates are consistently annotated at the protein level. Naturally, there are many situations where annotations using a specific transcript, or multiple transcripts are useful. The aim of the CARTs is not to impede this practice. Rather, the CART annotation serves as an anchor to ensure interoperability between different annotation systems and variant frequency accuracy. Annotations using other explicitly-named transcripts should also be provided, wherever useful. We have integrated transcript data to generate CARTs for over 18,000 genes, for both GRCh37 and GRCh38, based on the associated NM and ENST identified through the CART selection process. Each CART has a unique ID and can be used individually or as a stable set of templates; CART37A for GRCh37 and CART38A for GRCh38. We have made the CARTs available on the UCSC browser and in different file formats on the Open Science Framework: https://osf.io/tcvbq/. We have also made the CARTtools software we used to generate the CARTs available on GitHub. We hope the CARTs will be useful in helping to drive transparent, stable, consistent, interoperable variant annotation.
Highlights
An integral component of generation sequencing (NGS) gene analysis methods is the annotation of variation using the human reference genome as a baseline
Historical gene analysis methods, such as Sanger sequencing, can choose which sequences to use for variant annotation
The same variant annotated on GRCh38 would have genomic coordinates of chr2:73490120C>T, and would be annotated as c.8164C>T; p.Arg2722Ter using NM_015120.4 but c.8161C>T;p.Arg2721Ter in resources using reference genome based transcripts for annotation
Summary
An integral component of generation sequencing (NGS) gene analysis methods is the annotation of variation using the human reference genome as a baseline. The majority of the clinical community, and much of the clinical research community, use RefSeq NM transcripts as baseline sequences for variant annotation[1]. NGS-based gene analyses often use ENST transcripts as the baseline sequences for variant annotation. Given the intrinsic differences in the widely used variant annotation systems it is essential that the transcripts used for variant calling are transparently provided and stably available. The CARTs aim to provide standard, interoperable, stable gene templates for variant annotation that are based on the reference genome sequence, include the required structural information, and can be used either individually or as set. We hope the CARTs will be useful in helping to drive transparent, stable, consistent, interoperable variant annotations
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.