Abstract

The short variant discovery is one of the most important steps into genomics studies since it allows genetic variants identification that influences the emergence and evolution of some diseases. Specifically, cancer can be associated with germline variants present in small populations, such as somatic variants located in tumor cells. Therefore, it is necessary to implement workflows that allow data analysis resulting from the new generation sequencing while taking advantage of the resources available in HPC infrastructures. This work presents the PIPEMB-WDL workflow for HPC infrastructure to integrate the short variant discovery for germline and somatic calling, including pre-processing and variants refinement steps, following the best practices of GATK4. This workflow was developed using emerging technologies in current development like WDL and Cromwell engine. The challenges we address in this paper are integrating and deploying container technologies, workload manager technologies, Cromwell, and WDL in our HPC infrastructure.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call