Abstract

In cancer, fusions are important diagnostic markers and targets for therapy. Long-read transcriptome sequencing allows the discovery of fusions with their full-length isoform structure. However, due to higher sequencing error rates, fusion finding algorithms designed for short reads do not work. Here we present JAFFAL, to identify fusions from long-read transcriptome sequencing. We validate JAFFAL using simulations, cell lines, and patient data from Nanopore and PacBio. We apply JAFFAL to single-cell data and find fusions spanning three genes demonstrating transcripts detected from complex rearrangements. JAFFAL is available at https://github.com/Oshlack/JAFFA/wiki.

Highlights

  • Genomic rearrangements are common in the landscape of cancer and when breakpoints occur within different genes these can be transcribed into a new hybrid transcript, producing a so-called fusion gene

  • To the best of our knowledge, only three fusion finding methods are available for long-read transcriptome data: JAFFA [24] is a pipeline we previously developed and it can process transcriptome sequencing data of any length, it has low sensitivity when error rates are high; Aeron [25] detects fusions by aligning long reads to a graph based representation of the reference transcriptome; and LongGF [26] analyses genome mapped long-read data and detects fusions by identifying reads aligning to multiple genes

  • To take advantage of new long-read sequencing technologies for fusion finding and characterization, we have developed JAFFAL, a new method which is built on the concepts developed in JAFFA and overcomes the high error rate in long-read transcriptome data by using alignment methods and filtering heuristics which are designed to handle noisy long reads

Read more

Summary

Introduction

Genomic rearrangements are common in the landscape of cancer and when breakpoints occur within different genes these can be transcribed into a new hybrid transcript, producing a so-called fusion gene. Fusions may drive cancer through activation of onocogenes [1] or inactivation of tumor suppressors. Often such fusions are recurrent across patient cohorts and novel drugs have been developed to target a number of them [2]. Massively parallel short-read transcriptome sequencing has greatly expanded our knowledge of fusion genes across cancers and is increasingly being used for clinical diagnostics [3,4,5]. The Cancer Genome Atlas (TCGA) utilized short-read transcriptome sequencing across a range of tumor types to estimate that approximately 16% of cancers have a fusion event which drives the disease [6]. Since the advent of the first approaches [7, 8], fusion finding has improved in both accuracy and speed, and there are numerous tools available [9,10,11,12]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call