Abstract
Detection of small copy number variations (CNVs) in clinically relevant genes is routinely being used to aid diagnosis. We recently developed a tool, CNV-RF, capable of detecting clinically relevant CNVs with a high degree of sensitivity. CNV-RF implementation was designed for small gene panels and did not scale to large gene panels. Analyzing large gene panels with several hundred genes routinely failed due to memory limitations on a single computer, and, when successful, analysis took on average over 24 hours, making it impractical for routine use in the clinic. We need a reliable tool capable of accurately identifying clinically relevant CNVs on large gene panels within a more practical time frame. We have developed Hadoop-CNV-RF, a freely available, scalable, and more user-friendly implementation of CNV-RF capable of rapidly analyzing large datasets. Hadoop-CNV-RF takes advantage of Hadoop, a framework developed to analyze large amounts of data. In its implementation, we demonstrate the feasibility of developing scalable pipelines on Hadoop that integrate popular bioinformatics software developed for usage on traditional single-user computers without the need for special-purpose routines developed for Hadoop. Results show that Hadoop-CNV-RF reduces analysis time on large gene panels from over 24 hours to about 4 hours on a 20 node Hadoop cluster. Additionally, we demonstrate its ability to scale by analyzing a whole-exome dataset with close to a billion reads. Hadoop-CNV-RF has been clinically validated for large gene panels (up to 4800 genes) and is currently being used in the clinic. It is publicly available at: https://github.com/getiria-onsongo/hadoopcnvrf-public.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.