Abstract

Recent developments have led to an enormous increase of publicly available large genomic data, including complete genomes. The 1000 Genomes Project was a major contributor, releasing the results of sequencing a large number of individual genomes, and allowing for a myriad of large scale studies on human genetic variation. However, the tools currently available are insufficient when the goal concerns some analyses of data sets encompassing more than hundreds of base pairs and when considering haplotype sequences of single nucleotide polymorphisms (SNPs). Here, we present a new and potent tool to deal with large data sets allowing the computation of a variety of summary statistics of population genetic data, increasing the speed of data analysis.

Highlights

  • The most widely-used software packages, such as DnaSP [1] and Arlequin [2] cannot handle the data formats adopted by massive re-sequencing projects

  • The development of potent tools to analyze the genetic variation of large scale data stored in the variant call format (VCF) developed by the 1000 Genomes Project that has been adopted by other projects, such as UK10K, dbSNP and the NHLBI Exome Project, became imperative [3,4,5]

  • The software here described represents a new tool to efficiently use, DNA sequences and polymorphism data, like those recently released in the VCF format

Read more

Summary

Introduction

The most widely-used software packages, such as DnaSP [1] and Arlequin [2] cannot handle the data formats adopted by massive re-sequencing projects. We have developed a new and robust algorithm, which runs on DivStat software, which uses the power of Linux/Unix, Macintosh and Windows environments, reducing the learning curve for those users less familiar with the shell commands. The program is implemented with a command line shell and with a user-friendly graphical interface that facilitates algorithm use. This tool can be applied to either polymorphism data or DNA sequences. It can compute sequentially a variety of summary statistics of population genetic data over a "sliding window”. The window is slid across the surveyed area and new similar

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.