Abstract

BackgroundNext-generation sequencing (NGS) technologies have produced large volumes of genomic data. One common operation on heterogeneous genomic data is genomic interval intersection. Most of the existing tools impose restrictions such as not allowing nested intervals or requiring intervals to be sorted when finding overlaps in two or more interval sets.ResultsWe proposed segment tree (ST) and indexed segment tree forest (ISTF) based solutions for intersection of multiple genomic interval sets in parallel. We developed these methods as a tool, Joint Overlap Analysis (JOA), which takes n interval sets and finds overlapping intervals with no constraints on the given intervals. The proposed indexed segment tree forest is a novel composite data structure, which leverages on indexing and natural binning of a segment tree. We also presented construction and search algorithms for this novel data structure. We compared JOA ST and JOA ISTF with each other, and with other interval intersection tools for verification of its correctness and for showing that it attains comparable execution times.ConclusionsWe implemented JOA in Java using the fork/join framework which speeds up parallel processing by taking advantage of all available processor cores. We compared JOA ST with JOA ISTF and showed that segment tree and indexed segment tree forest methods are comparable with each other in terms of execution time and memory usage. We also carried out execution time comparison analysis for JOA and other tools and demonstrated that JOA has comparable execution time and is able to further reduce its running time by using more processors per node. JOA can be run using its GUI or as a command line tool. JOA is available with source code at https://github.com/burcakotlu/JOA/. A user manual is provided at https://joa.readthedocs.org

Highlights

  • Next-generation sequencing (NGS) technologies have produced large volumes of genomic data

  • We propose a composite data structure, indexed segment tree forest which leverages on natural binning of segment trees and indexing the segment tree nodes at a certain depth

  • Joint Overlap Analysis (JOA) is run both with segment tree (ST) and indexed segment tree forest (ISTF) options, GROK is run through its python API and intersectionL method is utilized, BEDTools is called by its intersect -a -b utility, and for BEDOPS, –intersect set operation is used

Read more

Summary

Objectives

Our aim is to reduce the search space and search on the shorter indexed segment tree forest rather than one tall segment tree and eventually to reduce the query time

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call