Abstract

Bioinformatics pipelines are an integral part of next-generation sequencing. Despite the rapid development of open source software for data analysis, the use of these tools through development of bioinformatics pipelines for sequencing analysis still remains a challenge and time-consuming task for academic research institutions and clinical laboratories. It requires substantial bioinformatics expertise to select appropriate analytical software tools, big data storage solutions and cloud infrastructure to manage the large amount of biological data generated by experimental high-throughput technologies. We propose a bioinformatics pipeline framework for DNA sequencing analysis. This pipeline is a solution for the rapid and efficient deployment of the workflow pipeline to institutions and laboratories, allowing reproducible results based on virtual machine technologies. It is capable of supporting the reference sequence and de novo assembly (without reference genome) for disease studies. The pipeline is flexible and offers the possibility to use three approaches for DNA sequencing such as, the whole genome, the whole exome and targeted sequencing. The pipeline takes into account both whole and exome sequencing to allow significant analysis results while retaining high positive predictions. If the analysis fails or researchers are spoiled for choice to interpret the results, it involves exploring targeted resequencing. The supported analyses are: functional, structural and statistical. Due to disparate data sources, storage requirements and the need for scalable analysis of biological data, the pipeline used big data technologies for storage and management and can also be deployed on the cloud, allowing access without investment overheads for additional hardware.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call