Abstract

High-throughput sequencing technologies are a milestone in molecular biology for facilitating great advances in genomics by enabling the deposit of large volumes of biological data to public databases. The availability of such data has made possible the comparative genomic analysis through pipelines, using the entire gene repertoire of genomes. However, a large number of unfinished genomes exist in public databases; their number is approximately 16-fold higher than the number of complete genomes, which creates bias during comparative analyses. Therefore, the present work proposes a new tool called Pan4Drafts, an automated pipeline for pan-genomic analysis of draft prokaryotic genomes to maximize the representation and accuracy of the gene repertoire of unfinished genomes by using reads from sequencing data. Pan4Draft allows to perform comparative analyses using different methodologies such as combining complete and draft genomes, using only draft genomes or only complete genomes. Pan4Draft is available at http://www.computationalbiology.ufpa.br/pan4drafts and the test dataset is available at https://sourceforge.net/projects/pan4drafts.

Highlights

  • The primary questions that can be addressed by comparative genomics involve understanding the evolutionary processes of organisms, the relationship between conserved DNA sequences encoding important functional proteins, and identifying non-coding sequences and proteins with non-essential functions[3]

  • We present Pan4Drafts, a user friendly, graphic computational tool that aims to improve the accuracy and increase the number of genes represented in draft genomes, which affect positively in pan-genomic analyses for unfinished genomes

  • One must add the parameters of the assembler, the aligner and and the pan-genomic analysis tool

Read more

Summary

Introduction

The primary questions that can be addressed by comparative genomics involve understanding the evolutionary processes of organisms, the relationship between conserved DNA sequences encoding important functional proteins, and identifying non-coding sequences and proteins with non-essential functions[3]. Zhang and colleagues (2015) carried out comparative analyses to identify essential genes, which were subsequently evaluated for their presence in genomic islands[6]. Other studies use these approaches to identify gene rearrangements, gene duplication, and gene acquisition by lateral gene transfer[7]. The increase in the number of genomes, completes or drafts, is more pronounced in bacteria[10], due to the compact nature of their genomes and due to their application in diverse fields, like biotechnological industries, agriculture, medicine and others. We present Pan4Drafts, a user friendly, graphic computational tool that aims to improve the accuracy and increase the number of genes represented in draft genomes, which affect positively in pan-genomic analyses for unfinished genomes. Besides the automatic integration of different tools, Pan4Draft allows the www.nature.com/scientificreports/

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.