GDF: Dealing with High-throughput Genotyping Multiplatform Data for Medical and Population Genetic Applications

Jorge Amigo,Antonio Salas,Javier Costas

doi:10.4172/jpb.1000206

Jorge Amigo, Antonio Salas + Show 1 more

Open Access

https://doi.org/10.4172/jpb.1000206

Copy DOI

Abstract

Background: A number of different high throughput genotyping platforms have arisen recently. These platforms generate large amounts of genotyping data which is subsequently processed and stored in public and/or private databases. Both, the variety of platforms employed by the different laboratories and the large amount of data they generate, entail serious problems for data managing in most laboratories. Some public or private software packages available today solve some important needs, but they deal with the data from a point of view that the researcher may probably not share, and no supervision of the results (e.g. genotyping inconsistencies or summaries of the genotyping data) may be performed. Results: The main goal of the Genotyping Data Filter (GDF) software is to allow the researcher to locally manage large numbers of genotypes generated by the most standard genotyping platforms, obtaining statistics and summaries of the genotyping experiments whilst maintaining their privacy. GDF also allows the user to supervise the data such that the researcher can easily evaluate important parameters, including the proportion of missing data in samples and single nucleotide polymorphisms (SNPs), Hardy-Weinberg equilibrium, etc. Additionally, GDF parses the raw data into different text formats needed as input files in popular software packages frequently used in medical and population genetic applications. Conclusions: GDF is a Perl program that efficiently process data from various genotyping platforms, allowing researchers to easily inspect their own genotyping data and to parse it for a wide spectrum of well-known specialized analysis software. It has been prepared to be run through a user friendly web interface on the most common cases, but it can also be run as a local script on personal computers, or even supercomputers for very large-scale projects.

Highlights

A number of different high throughput genotyping platforms have arisen recently
This is due to the correlation between alleles at nearby variant sites, named linkage disequilibrium (LD), that exists because of the shared ancestry of contemporary chromosomes that is erode by mutation and recombination [4]
From the initial efforts to characterize the human genome by studying its common variability [5,6], the HapMap Project was born as a public effort to build a map of these haplotype blocks and their htSNPs

Summary

Results

We have developed a program written in Perl, as it is one of the most popular reference programming language for fast and comprehensive text handling [23]. As some researchers may not be comfortable with command line commands, several graphical user interfaces (GUIs) have been developed to work around this issue: i) an online PHP interface to the most updated version of GDF, which runs it directly on the web server without having to install anything locally, and ii) a Visual Basic interface that runs an encapsulated executable version of GDF for Windows platforms only In both cases the user gets a four steps interface: i) the data input, where all the files that are going to be used must be selected, ii) the options selection, where all the GDF’s options may be chosen, iii) the formats request, where the programs to which the data should be formatted for should be highlighted, and iv) the final results. In this last step there will always be a screen output, accompanied by a link to all the files that were generated (one of them will be that screen output for later inspection)

Conclusions

Background

Discussion

Conclusion