Abstract

Fusion gene derived from genomic rearrangement plays a key role in cancer initiation. The discovery of novel gene fusions may be of significant importance in cancer diagnosis and treatment. Meanwhile, next generation sequencing technology provide a sensitive and efficient way to identify gene fusions in genomic levels. However, there are still many challenges and limitations remaining in the existing methods which only rely on unmapped reads or discordant alignment fragments. In this work we have developed GFusion, a novel method using RNA-Seq data, to identify the fusion genes. This pipeline performs multiple alignments and strict filtering algorithm to improve sensitivity and reduce the false positive rate. GFusion successfully detected 34 from 43 previously reported fusions in four cancer datasets. We also demonstrated the effectiveness of GFusion using 24 million 76 bp paired-end reads simulation data which contains 42 artificial fusion genes, among which GFusion successfully discovered 37 fusion genes. Compared with existing methods, GFusion presented higher sensitivity and lower false positive rate. The GFusion pipeline can be accessed freely for non-commercial purposes at: https://github.com/xiaofengsong/GFusion.

Highlights

  • The existence of fusion genes in cancer such as breast, lung, colon, prostate cancers and colorectal lymphoma has been confirmed in numerous researches[8,9,10,11]

  • FusionMap creates a pseudo fusion transcript library based on spanning fusion boundaries reads, and remaps full-length reads to this pseudo reference, while TopHat-Fusion uses a series of post-processing routines to filter out false fusions

  • The current pipelines heavily depend on individual unmapped reads which harbor the fusion boundaries or discordant paired-end reads, in which each reads align against different genes, leading to neglecting the mate reads of unmapped reads or reads that span fusion boundaries

Read more

Summary

Introduction

The existence of fusion genes in cancer such as breast, lung, colon, prostate cancers and colorectal lymphoma has been confirmed in numerous researches[8,9,10,11]. FusionSeq is a novel approach which can detect candidate fusion transcripts by analyzing paired-end RNA-Seq data. The running cost of FusionSeq is much higher in terms of running time and CPU usage because the function junction library is normally quite large Both of FusionMap and Tophat-Fusion can apply on the single-end or paired-end read using similar strategy that is to split reads into shorter segments and select segments aligning against different genes. To improve the sensitivity and specificity of fusion detection, and avoid the above limitations, we present a novel pipeline named GFusion, a powerful and efficient fusions detection method for both paired-end and single-end RNA-seq data, to predict fusion genes by comprehensive analyzing alignment split reads and mate reads. Further results illustrated that GFusion performed higher sensitivity and have lower false positive rate by comparing with other existing fusion detection pipelines

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call