Machine learning approach for pooled DNA sample calibration.

Andrew D Hellicar,Daniel V Smith,Ashfaqur Rahman,John M Henshall

doi:10.1186/s12859-015-0593-1

Andrew D Hellicar, Daniel V Smith + Show 2 more

Open Access

https://doi.org/10.1186/s12859-015-0593-1

Copy DOI

Abstract

BackgroundDespite ongoing reduction in genotyping costs, genomic studies involving large numbers of species with low economic value (such as Black Tiger prawns) remain cost prohibitive. In this scenario DNA pooling is an attractive option to reduce genotyping costs. However, genotyping of pooled samples comprising DNA from many individuals is challenging due to the presence of errors that exceed the allele frequency quantisation size and therefore cannot be simply corrected by clustering techniques. The solution to the calibration problem is a correction to the allele frequency to mitigate errors incurred in the measurement process. We highlight the limitations of the existing calibration solutions such as the fact they impose assumptions on the variation between allele frequencies 0, 0.5, and 1.0, and address a limited set of error types. We propose a novel machine learning method to address the limitations identified.ResultsThe approach is tested on SNPs genotyped with the Sequenom iPLEX platform and compared to existing state of the art calibration methods. The new method is capable of reducing the mean square error in allele frequency to half that achievable with existing approaches. Furthermore for the first time we demonstrate the importance of carefully considering the choice of training data when using calibration approaches built from pooled data.ConclusionThis paper demonstrates that improvements in pooled allele frequency estimates result if the genotyping platform is characterised at allele frequencies other than the homozygous and heterozygous cases. Techniques capable of incorporating such information are described along with aspects of implementation.

Highlights

Despite ongoing reduction in genotyping costs, genomic studies involving large numbers of species with low economic value remain cost prohibitive
This is the first study of a machine learning approach to calibration of pooled SNP samples which has demonstrated the importance of training sample location on performance
The approach was tested on data generated by a Sequenom iPLEX SNP panel providing results for 61 SNPs on Tiger prawn individual and pooled samples

Summary

Introduction

Despite ongoing reduction in genotyping costs, genomic studies involving large numbers of species with low economic value (such as Black Tiger prawns) remain cost prohibitive. In this scenario DNA pooling is an attractive option to reduce genotyping costs. The cost benefits achieved in [1] have not been realised on platforms based on alternative technology, such as Sequenom, and pooling is still required in this scenario This is evidenced by the ongoing use of DNA pooling in studies on low economic value species, to reduce. In the case of DNA pooling, the ‘substances’ are the discrete SNP genotypes AA, AB, BB with corresponding A-allele frequencies 1, 1/2, 0 and the ‘concentration’ is equivalent to the real valued A-allele frequency within the range [0, 1]

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC bioinformatics	Publication Date: Jul 9, 2015
Citations: 20	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Machine learning approach for pooled DNA sample calibration.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC bioinformatics

Lead the way for us

Similar Papers

Allele frequency distributions in pooled DNA samples: applications to mapping complex disease genes.
Sarah H Shaw ... Carl Kashuk
Genome research | VOL. 8
Sarah H Shaw, et. al.Sarah H Shaw ... Carl Kashuk
01 Feb 1998
Genome research | VOL. 8

Quantitative high resolution melting: two methods to determine SNP allele frequencies from pooled samples.
Roxana L Capper ... Petra B Lundgren
BMC Genetics | VOL. 16
Roxana L Capper, et. al.Roxana L Capper ... Petra B Lundgren
13 Jun 2015
BMC Genetics | VOL. 16

A hierarchical learning approach to calibrate allele frequencies for SNP based genotyping of DNA pools
Andrew D Hellicar ... Ulrich Engelke
-
Andrew D Hellicar, et. al.Andrew D Hellicar ... Ulrich Engelke
01 Jul 2014
01 Jul 2014

Determining relative microsatellite allele frequencies in pooled DNA samples.
H Khatib ... M Soller
PCR methods and applications | VOL. 4
H Khatib, et. al.H Khatib ... M Soller
01 Aug 1994
PCR methods and applications | VOL. 4

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Machine learning approach for pooled DNA sample calibration.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC bioinformatics