Abstract

The International HapMap Project provides a resource of genotypic data on single nucleotide polymorphisms (SNPs), which can be used in various association studies to identify the genetic determinants for phenotypic variations. Prior to the association studies, the HapMap dataset should be preprocessed in order to reduce the computation time and control the multiple testing problem. The less informative SNPs including those with very low genotyping rate and SNPs with rare minor allele frequencies to some extent in one or more population are removed. Some research designs only use SNPs in a subset of HapMap cell lines. Although the HapMap website and other association software packages have provided some basic tools for optimizing these datasets, a fast and user-friendly program to generate the output for filtered genotypic data would be beneficial for association studies. Here, we present a flexible, straight-forward bioinformatics program that can be useful in preparing the HapMap genotypic data for association studies by specifying cell lines and two common filtering criteria: minor allele frequencies and genotyping rate. The software was developed for Microsoft Windows and written in C++. The Windows executable and source code in Microsoft Visual C++ are available at Google Code (http://hapmap-filter-v1.googlecode.com/) or upon request. Their distribution is subject to GNU General Public License v3.

Highlights

  • The International HapMap Project [1] provides a resource of genotypic data of more than 3.1 million single nucleotide polymorphisms (SNPs) [2] for human lymphoblastoid cell lines (LCLs) derived from the individuals of European (CEU: Caucasians from Utah, USA), African (YRI: Yoruba people from Ibadan, Nigeria) and Asian ancestry (CHB: Han Chinese from Beijing, China and JPT: Japanese from Tokyo, Japan)

  • Because of the severity of multiple comparisons due to the large number of SNPs and the running time might be needed for a whole genome association study, the raw genotypic data downloaded from the HapMap website [8] requires some degree of preprocessing that includes, but is not limited to, removing uninformative and biased SNPs

  • We wrote a C++ program, HapMap Filter, using Microsoft Visual C++ 6.0 for Windows to generate a high-quality HapMap SNP dataset that are ready to be used in association studies

Read more

Summary

Introduction

The International HapMap Project [1] provides a resource of genotypic data of more than 3.1 million single nucleotide polymorphisms (SNPs) [2] for human lymphoblastoid cell lines (LCLs) derived from the individuals of European (CEU: Caucasians from Utah, USA), African (YRI: Yoruba people from Ibadan, Nigeria) and Asian ancestry (CHB: Han Chinese from Beijing, China and JPT: Japanese from Tokyo, Japan). Association studies using the HapMap genotypic data have generated new insights into the genetic determinants responsible for the risks of common diseases as well as quantitative phenotypes such as gene expression and individual drug response to therapeutic treatments [3,4,5,6,7]. Because of the severity of multiple comparisons due to the large number of SNPs and the running time might be needed for a whole genome association study, the raw genotypic data downloaded from the HapMap website [8] requires some degree of preprocessing that includes, but is not limited to, removing uninformative and biased SNPs. The HapMap website [8] provides a web-based interface for users to extract genotypic data on each population by setting up parameters including population, minor allele frequency (MAF), SNP location and genomic regions.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call