Hadoop-CNV-RF

Getiria Onsongo,Ham Ching Lam,Matthew Bower,Bharat Thyagarajan

doi:10.1145/3388440.3414861

Abstract

Detection of small copy number variations (CNVs) in clinically relevant genes is routinely being used to aid diagnosis. We recently developed a tool, CNV-RF, capable of detecting clinically relevant CNVs with a high degree of sensitivity. CNV-RF implementation was designed for small gene panels and did not scale to large gene panels. Analyzing large gene panels with several hundred genes routinely failed due to memory limitations on a single computer, and, when successful, analysis took on average over 24 hours, making it impractical for routine use in the clinic. We need a reliable tool capable of accurately identifying clinically relevant CNVs on large gene panels within a more practical time frame. We have developed Hadoop-CNV-RF, a freely available, scalable, and more user-friendly implementation of CNV-RF capable of rapidly analyzing large datasets. Hadoop-CNV-RF takes advantage of Hadoop, a framework developed to analyze large amounts of data. In its implementation, we demonstrate the feasibility of developing scalable pipelines on Hadoop that integrate popular bioinformatics software developed for usage on traditional single-user computers without the need for special-purpose routines developed for Hadoop. Results show that Hadoop-CNV-RF reduces analysis time on large gene panels from over 24 hours to about 4 hours on a 20 node Hadoop cluster. Additionally, we demonstrate its ability to scale by analyzing a whole-exome dataset with close to a billion reads. Hadoop-CNV-RF has been clinically validated for large gene panels (up to 4800 genes) and is currently being used in the clinic. It is publicly available at: https://github.com/getiria-onsongo/hadoopcnvrf-public.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Hadoop-CNV-RF

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Monogenic diabetes in New Zealand - An audit based revision of the monogenic diabetes genetic testing pathway in New Zealand.
Francesca Harrington ... Ryan Paul
Frontiers in endocrinology | VOL. 14
Francesca Harrington, et. al.Francesca Harrington ... Ryan Paul
24 Mar 2023
Frontiers in endocrinology | VOL. 14

Abstract LB-105: Characterization of total mutational burden in the GENIE cohort: Small and large panels can provide TMB information but to varying degrees
Alexander S Baras ... Thomas Stricker
Cancer Research | VOL. 77
Alexander S Baras, et. al.Alexander S Baras ... Thomas Stricker
01 Jul 2017
Cancer Research | VOL. 77

The phenotypic and genotypic spectrum of epilepsy and intellectual disability in adults: Implications for genetic testing.
Sophie Von Brauchitsch ... Susanne Knake
Epilepsia Open | VOL. 8
Sophie Von Brauchitsch, et. al.Sophie Von Brauchitsch ... Susanne Knake
17 Mar 2023
Epilepsia Open | VOL. 8

Tumor mutation burden derived from small next generation sequencing targeted gene panel as an initial screening method.
Yuan Tang ... Ting Hou
Translational Lung Cancer Research | VOL. 9
Yuan Tang, et. al.Yuan Tang ... Ting Hou
01 Feb 2020
Translational Lung Cancer Research | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Hadoop-CNV-RF

Abstract

Talk to us

Similar Papers