Abstract

Interspersed duplicated insertion (idINS) is a common type of genomic insertion and plays an important role in genomic instability in cancer genesis. Nevertheless, the detection of such type of insertions is challenging, since the reads originated from idINS regions in the donor sample are most likely to be mapped perfectly to other regions in the reference. Most of the existing approaches adopt paired-end mapping to detect idINSs, but the characterization of idINSs larger than the mean insert size is still challenging due to the short sequencing reads. Therefore, there is still a need for practical algorithms to detect and infer idINSs regardless of their lengths. Here, we present a new algorithm, called DIPins, which can accurately detect and infer idINSs contents from paired-end reads. DIPins is capable of detecting breakpoint positions and inferring the contents of idINSs even when the length of variation exceeds the paired-end insert size. The major principle of DIPins is that it extracts multiple signatures from split reads and integrates them to determine idINS positions and adopts a dynamic process to construct idINS contents by iteratively generating unobserved split reads from the restricted area around the idINS breakpoint. We test the performance of DIPins on both simulation and real data. The results demonstrate its advantages over other methods and its potential application prospects in the accurate characterization of idINSs in human genome.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call