Abstract

Background: The somatic mutations that drive acute myeloid leukaemia (AML) are highly heterogenous. Identifying the mutations that drive individual cases can help determine patient prognosis and therapy. For this reason, genetic testing for prognostically important mutations in leukaemic DNA is now routine in many diagnostic laboratories. Also, the analysis of gene-expression profiles from AML RNA can provide additional clinically useful information that cannot be inferred from DNA sequencing. Whilst it is both expensive and impractical to carry out both types of sequencing and analyses, we hypothesised that the two could be combined more cheaply and effectively if the presence of mutations in prognostically important gene mutations could be identified from RNA-seq data. However, computational methodologies for robust detection of the diverse types of somatic mutations found in AML such as substitutions, indels, tandem duplications and translocations, are not currently available. Aims: To develop an stand-alone, lightweight and use-friendly software for the identification of clinically relevant mutations from AML RNA-seq data. Methods: To ensure efficient mapping of RNA-seq reads, we hash-indexed the DNA sequences of target genes using 10-mer sliding windows and implemented the ”seed and extend“ algorithm for read alignment. Point mutations and small indels were detected from reads with imperfect alignments and tandem duplications were detected from reads spanning the duplication junction. Translocations were detected from reads whose ends belonged to preselected fusion partner genes. Results: To benchmark our software we used RNA-seq data from 151 whole-exome/genome-sequenced AML samples studied by The Cancer Genome Atlas Research Network. We show that our software reliably calls clinically important mutations affecting the genes NPM1 (4-nt insertion), FLT3 (substitutions and internal tandem duplications, ITD), MLL partial tandem duplications (PTD), as well as substitutions in CEBPA, IDH1/2, TP53 and RUNX1. Furthermore, we identified gene fusions including PML-RARA, MYH11-CBFB, RUNX1-RUNX1T1, BCR-ABL1 and NUP98-NSD1. Our software is fast and memory efficient and is able to identify the above mutations in less than 20 minutes starting with RNA-seq FastQ files of 100 million 50 bp paired-end reads, using a standard modern laptop computer. In addition, the software operates through a graphical user interface making it accessible to users without programming knowledge. Summary/Conclusion: We demonstrated that clinically important somatic mutations that drives AML can be reliably detected from RNA-seq data alone using our software. As our approach can be readily combined with conventional gene expression analyses of the same RNA-seq dataset, it can be used to generate data with enhanced clinical utility that can improve prognostication and guide patient treatment. As RNA sequencing is a straightforward procedure, our approach can readily enter clinical laboratories, where it can significantly reduce experimental costs and accelerate diagnostic work-ups.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.