Abstract

Long noncoding RNAs (lncRNAs) represent a vast unexplored genetic space that may hold missing drivers of tumourigenesis, but few such “driver lncRNAs” are known. Until now, they have been discovered through changes in expression, leading to problems in distinguishing between causative roles and passenger effects. We here present a different approach for driver lncRNA discovery using mutational patterns in tumour DNA. Our pipeline, ExInAtor, identifies genes with excess load of somatic single nucleotide variants (SNVs) across panels of tumour genomes. Heterogeneity in mutational signatures between cancer types and individuals is accounted for using a simple local trinucleotide background model, which yields high precision and low computational demands. We use ExInAtor to predict drivers from the GENCODE annotation across 1112 entire genomes from 23 cancer types. Using a stratified approach, we identify 15 high-confidence candidates: 9 novel and 6 known cancer-related genes, including MALAT1, NEAT1 and SAMMSON. Both known and novel driver lncRNAs are distinguished by elevated gene length, evolutionary conservation and expression. We have presented a first catalogue of mutated lncRNA genes driving cancer, which will grow and improve with the application of ExInAtor to future tumour genome projects.

Highlights

  • Amongst the most numerous, yet poorly understood of the latter are long noncoding RNAs

  • The majority of GENCODE long noncoding RNAs (lncRNAs) annotations are spliced (21,523/23,898 = 90.0% of transcripts), and we assume throughout that their functional sequence resides in exonic regions that are incorporated into the mature transcript[20]

  • We hypothesised that driver lncRNAs will display an excess of somatic mutations in exons compared to the local background mutational rate, estimated by their introns and flanking genomic regions – referred to as “background regions”

Read more

Summary

Introduction

Amongst the most numerous, yet poorly understood of the latter are long noncoding RNAs (lncRNAs). The absence of whole-genome maps of somatic mutations has meant that searches for new cancer-related lncRNAs have relied on conventional transcriptomic approaches that reveal changes in their expression levels that accompany cancer Such approaches are not capable of distinguishing passenger and driver effects, nor do they identify mutations in the mature lncRNA sequence that may drive tumourigenesis independent of upstream regulatory changes[8,12,13]. ActiveDriver[18] searches for genes with excess mutations falling in signaling sites, protein domains and regulatory motifs While these approaches have discovered dozens of new cancer genes, their use of features specific to protein-coding genes to infer mutational biases, makes them inapplicable to lncRNA

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call