Abstract
BackgroundNext generation sequencing (NGS) technologies have made it possible to exhaustively detect structural variations (SVs) in genomes. Although various methods for detecting SVs have been developed, the global structure of chromosomes, i.e., how segments in a reference genome are extracted and ordered in an unknown target genome, cannot be inferred by detecting only individual SVs.ResultsHere, we formulate the problem of inferring the global structure of chromosomes from SVs as an optimization problem on a bidirected graph. This problem takes into account the aberrant adjacencies of genomic regions, the copy numbers, and the number and length of chromosomes. Although the problem is NP-complete, we propose its polynomial-time solvable variation by restricting instances of the problem using a biologically meaningful condition, which we call the weakly connected constraint. We also explain how to obtain experimental data that satisfies the weakly connected constraint.ConclusionOur results establish a theoretical foundation for the development of practical computational tools that could be used to infer the global structure of chromosomes based on SVs. The computational complexity of the inference can be reduced by detecting the segments of the reference genome at the ends of the chromosomes of the target genome and also the segments that are known to exist in the target genome.
Highlights
Generation sequencing (NGS) technologies have made it possible to exhaustively detect structural variations (SVs) in genomes
We address the problem of inferring the global structure of chromosomes based on SV data, which refer to aberrant adjacencies of genomic regions and copy number variations (CNVs) in this study
Detecting only individual SVs cannot reveal the global structure of chromosomes
Summary
Generation sequencing (NGS) technologies have made it possible to exhaustively detect structural variations (SVs) in genomes. Various methods for detecting SVs have been developed, the global structure of chromosomes, i.e., how segments in a reference genome are extracted and ordered in an unknown target genome, cannot be inferred by detecting only individual SVs. Next-generation sequencing (NGS) technologies have drastically reduced the cost of genome sequencing [1]. As more genomic sequences have become available, it has become clear that genomes contain many structural variations (SVs), which include large insertions, deletions, tandem duplications, and translocations. Many SVs are occasionally concentrated in a small region of the genome [4,5,6].
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.