Abstract
The non-coding regions of tumour cell genomes harbour a considerable fraction of total DNA sequence variation, but the functional contribution of these variants to tumorigenesis is ill-defined. Among these non-coding variants, somatic insertions are among the least well characterized due to challenges with interpreting short-read DNA sequences. Here, using a combination of Chip-seq to enrich enhancer DNA and a computational approach with multiple DNA alignment procedures, we identify enhancer-associated small insertion variants. Among the 102 tumour cell genomes we analyse, small insertions are frequently observed in enhancer DNA sequences near known oncogenes. Further study of one insertion, somatically acquired in primary leukaemia tumour genomes, reveals that it nucleates formation of an active enhancer that drives expression of the LMO2 oncogene. The approach described here to identify enhancer-associated small insertion variants provides a foundation for further study of these abnormalities across human cancers.
Highlights
The non-coding regions of tumour cell genomes harbour a considerable fraction of total DNA sequence variation, but the functional contribution of these variants to tumorigenesis is ill-defined
There is recent evidence that somatically acquired small insertions and deletions (INDELs) can nucleate oncogenic enhancer activity[8], but this form of variation can be overlooked because sequencing technologies generally produce short reads that can be challenging to align to the reference genome[2,10,11]
We subjected a random subset of 68 enhancerassociated insertion candidates in MOLT4 T cell acute lymphoblastic leukemia (T-ALL) cells to highthroughput sequencing, which confirmed that 48 (71%) of the predicted insertions were present in these tumour genomes (Fig. 2b, Supplementary Table 2)
Summary
The non-coding regions of tumour cell genomes harbour a considerable fraction of total DNA sequence variation, but the functional contribution of these variants to tumorigenesis is ill-defined. We propose an alternative strategy to identify bona fide non-coding driver mutations by analysis of sequencing reads from chromatin immunoprecipitation (ChIP-Seq) of the enhancer-associated histone mark H3K27ac (H3K27ac ChIP-Seq) This approach has an intrinsic advantage over whole-genome sequencing approaches to identifying functional variants because H3K27ac sequence reads are generated predominantly from active regulatory sites, providing a more direct link between the variant and putative function[14,15]. A heterozygous 8 basepair (bp) insertion in T cell leukaemias proximal to the LMO2 oncogene, is demonstrated to affect gene control This knowledge of enhancer-associated insertions provides a foundation for further studies to define the oncogenic contributions of this class of variants. A catalogue of 328,871 candidate enhancer-associated insertions (Supplementary Data 1), which range in size from 1 to 31 bp (Fig. 1B, Supplementary Fig. 1D), were identified using this approach
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.