Abstract
BackgroundAccurate genotype calling for high throughput Illumina data is an important step to extract more genetic information for a large scale genome wide association studies. Many popular calling algorithms use mixture models to infer genotypes of a large number of single nucleotide polymorphisms in a fast and efficient way. In practice, mixture models are mostly restricted to infer genotypes for common SNPs where their minor allele frequencies are quite large. However, it is still challenging to accurately genotype rare variants, especially for some rare variants where the boundaries of their genotypes are not clearly defined.ResultsTo further improve the call accuracy and the quality of genotypes on rare variants, a new model calling procedure, named M-D, is proposed to infer genotypes for the Illumina BeadArray data. In this calling procedure, a Gaussian Mixture Model and a Dirichlet Process Gaussian Mixture Model are integrated to infer genotypes.ConclusionsApplications to Illumina data illustrate that this new approach can improve calling performance compared to other popular genotyping algorithms.
Highlights
Accurate genotype calling for high throughput Illumina data is an important step to extract more genetic information for a large scale genome wide association studies
Several popular calling algorithms have been designed for Illumina platform, such as: BEAGLE with BEAGLECALL software [6], CRLMM [7, 8], GenCall [9], GenoSNP [10], and Iluminus [11]
Illumina BeadArray data The Illumina Omni BeadArray chip collects over one million single nucleotide polymorphisms (SNPs) per sample, and increasingly covers the newly identified variants
Summary
Accurate genotype calling for high throughput Illumina data is an important step to extract more genetic information for a large scale genome wide association studies. Many popular calling algorithms use mixture models to infer genotypes of a large number of single nucleotide polymorphisms in a fast and efficient way. With the rapid development in biotechnology, a leading producer, Illumina [5], is capable of offering SNP arrays with tremendously wide coverage of genetic variants in a fast and cost efficient way. A number of high dimensional intensity data are generated by this manufacturer, and various powerful genotyping algorithms are imperatively needed to accurately infer genotypes. Illumina chip catalogs millions of SNPs and processes a large number of parallel samples, and the genotyping algorithms for the Illumina data is of the main interest
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.