LongGF: computational algorithm and software tool for fast and accurate detection of gene fusions by long-read transcriptome sequencing

Qian Liu,Kai Wang,Jiang F Zhong,Yu Hu,Li Fang,Andres Stucky

doi:10.1186/s12864-020-07207-4

Qian Liu, Kai Wang + Show 4 more

Open Access

https://doi.org/10.1186/s12864-020-07207-4

Copy DOI

Abstract

BackgroundLong-read RNA-Seq techniques can generate reads that encompass a large proportion or the entire mRNA/cDNA molecules, so they are expected to address inherited limitations of short-read RNA-Seq techniques that typically generate < 150 bp reads. However, there is a general lack of software tools for gene fusion detection from long-read RNA-seq data, which takes into account the high basecalling error rates and the presence of alignment errors.ResultsIn this study, we developed a fast computational tool, LongGF, to efficiently detect candidate gene fusions from long-read RNA-seq data, including cDNA sequencing data and direct mRNA sequencing data. We evaluated LongGF on tens of simulated long-read RNA-seq datasets, and demonstrated its superior performance in gene fusion detection. We also tested LongGF on a Nanopore direct mRNA sequencing dataset and a PacBio sequencing dataset generated on a mixture of 10 cancer cell lines, and found that LongGF achieved better performance to detect known gene fusions over existing computational tools. Furthermore, we tested LongGF on a Nanopore cDNA sequencing dataset on acute myeloid leukemia, and pinpointed the exact location of a translocation (previously known in cytogenetic resolution) in base resolution, which was further validated by Sanger sequencing.ConclusionsIn summary, LongGF will greatly facilitate the discovery of candidate gene fusion events from long-read RNA-Seq data, especially in cancer samples. LongGF is implemented in C++ and is available at https://github.com/WGLab/LongGF.

Highlights

Long-read RNA-Seq techniques can generate reads that encompass a large proportion or the entire mRNA/cDNA molecules, so they are expected to address inherited limitations of short-read RNA-Seq techniques that typically generate < 150 bp reads
Our evaluation demonstrated that Long-read gene fusion detector (LongGF) successfully detected candidate gene fusions from long-read RNAseq data, and some of these fusions are previously known or can be validated by additional Sanger sequencing
Please note that we do not consider these pseudogenes in the analysis by default, and in LongGF, users can specify whether to use pseudogenes in gene fusion detection; and (2) a long read sequence whose mapped bases of two alignment records do not have an appropriate gap: for example in a long read sequence, the mapped bases from M1 to M2 are used in one alignment record, and bases from M3 to M4 are used in another alignment record, M3 > M1, if M3 − M2 is larger than a threshold (such as 20, as shown in Fig. 1 (e)) or less than − 20, as shown in Fig. 1 (d), the two alignment records do not have an appropriate gap

Summary

Introduction

There is a general lack of software tools for gene fusion detection from long-read RNA-seq data, which takes into account the high basecalling error rates and the presence of alignment errors. Gene fusion plays a critical role in transcriptome diversity and may be associated with human diseases, especially cancer. Gene fusions can be used as biomarkers for cancer diagnosis, such as in breast cancer [16] and ovarian cancer [17], and used as therapeutic targets for cancer [18,19,20,21]. The ability to target and better understand gene fusions may lead to the development of novel targeted therapies in the future

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Genomics	Publication Date: Dec 1, 2020
Citations: 32	License type: open-access

R Discovery Prime

R Discovery Prime

LongGF: computational algorithm and software tool for fast and accurate detection of gene fusions by long-read transcriptome sequencing

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics

Lead the way for us

Similar Papers

Data from Gene Fusion Detection and Characterization in Long-Read Cancer Transcriptome Sequencing Data with FusionSeeker
Zechen Chong ... Yu Chen
-
Zechen Chong, et. al.Zechen Chong ... Yu Chen
31 Mar 2023
31 Mar 2023

Data from Gene Fusion Detection and Characterization in Long-Read Cancer Transcriptome Sequencing Data with FusionSeeker
Weisheng Chen ... Zhengzhi Tan
-
Weisheng Chen, et. al.Weisheng Chen ... Zhengzhi Tan
31 Mar 2023
31 Mar 2023

Gene Fusion Detection and Characterization in Long-Read Cancer Transcriptome Sequencing Data with FusionSeeker.
Yu Chen ... Zhengzhi Tan
Cancer Research | VOL. 83
Yu Chen, et. al.Yu Chen ... Zhengzhi Tan
01 Nov 2022
Cancer Research | VOL. 83

Detection of Cryptic Gene Fusion and Chimeric RNA Variants in Relapsed/Refractory Acute Myeloid Leukemia Patients Diagnosed with KMT2A/Afdn Chromosomal Translocation
Esther G Chong ... Xi Zhang
Blood | VOL. 142
Esther G Chong, et. al.Esther G Chong ... Xi Zhang
28 Nov 2023
Blood | VOL. 142

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

LongGF: computational algorithm and software tool for fast and accurate detection of gene fusions by long-read transcriptome sequencing

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics