Abstract

BackgroundThe potential utility of the Burrows-Wheeler transform (BWT) of a large amount of short-read data ("reads") has not been fully studied. The BWT basically serves as a lossless dictionary of reads, unlike the heuristic and lossy reads-to-genome mapping results conventionally obtained in the first step of sequence analysis. Thus, it is naturally expected to lead to development of sensitive methods for analysis of short-read data. Recently, one of the most active areas of research in sequence analysis is sensitive detection of rare genomic rearrangements from whole-genome sequencing (WGS) data of heterogeneous cancer samples. The application the BWT of reads to the analysis of genomic rearrangements is addressed in this study.ResultsA new method for sensitive detection of genomic rearrangements by using the BWT of reads in the following three steps is proposed: first, breakpoint regions, which contain breakpoints and are joined together by rearrangement, are predicted from the distribution of so-called discordant pairs by using a kind of the conjugate gradient method; second, reads partially matching the breakpoint regions are collected from the BWT of reads; and third, breakpoints are detected as branching points among the collected reads, and their precise positions are determined. The method was experimentally implemented, and its performance (i.e., sensitivity and specificity) was evaluated by using simulated data with known artificial rearrangements. It was applied to publicly available real biological WGS data of cancer patients, and the detection results were compared with published results.ConclusionsServing as a lossless dictionary of reads, the BWT of short reads enables sensitive analysis of genomic rearrangements in heterogeneous cancer-genome samples when used in conjunction with breakpoint-region predictions based on a conjugate gradient method.

Highlights

  • The potential utility of the Burrows-Wheeler transform (BWT) of a large amount of short-read data ("reads”) has not been fully studied

  • The authors showed that the BWT of reads enables ultrafast analysis of single nucleotide polymorphisms (SNPs) [7]

  • The edit distance for the tumor sample shows a characteristic change as described in the previous section

Read more

Summary

Introduction

The potential utility of the Burrows-Wheeler transform (BWT) of a large amount of short-read data ("reads”) has not been fully studied. The application the BWT of reads to the analysis of genomic rearrangements is addressed in this study. The advent of so-called next-generation sequencers (NGS) has posed a challenging problem for developing ultrafast genome-mapping tools that can cope with the unprecedented flood of short-read data [3]. The problem has already been almost solved satisfactorily by popular short-read mapping tools, many of which owe their superior performance to the BWT of reference genomes [4,5]. Cox et al [9] proposed large-scale compression (using the BWT of reads) of genomic sequence data. Janin et al [10] proposed adaptive large-scale reference-free compression ( using the BWT of reads) of base-calling quality scores. The authors showed that the BWT of reads enables ultrafast analysis of single nucleotide polymorphisms (SNPs) [7]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call