Abstract
The combination of low density SNP arrays and DNA pooling is a fast and cost effective approach to genotyping that opens up basic genomics to a range of new applications and studies. However we have identified significant limitations in the existing approach to calculating allele frequencies with DNA pooling. These limitations include a reduced ability to deal with SNP to SNP variation via the standard interpolation method. Our contribution is a new hierarchical learning framework which resolves these drawbacks. The framework involves a hierarchy of two greedily trained layers of learners. The first layer learns the bias of each SNP then applies a calibration to reduce SNP bias by mapping into a common coordinate system across all SNPs. The second layer learns an allele frequency function exploiting the global SNP data. A range of algorithms have been applied including linear regression, neural network and support vector regression. The framework has been tested on pooled samples of Black Tiger prawns that have been genotyped with low density Sequenom iPLEX panels. Analysis of pooled samples and the corresponding individually genotyped SNP samples indicate the pooling approach introduces an allele frequency RMS error of 0.12. The existing calibration approach corrects ~14% of the error. Our hierarchical approach is 4.5 times as effective by correcting for ~64% of the introduced error. This is a significant reduction and has the potential to enable genetic studies previously not possible due to allele frequency error. Although testing so far is limited to low density SNP arrays the approach was developed to generalize to other SNP genotyping technologies. Keywords—Machine learning, DNA
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.