Abstract

Low-coverage next-generation sequencing methodologies are routinely employed to genotype large populations. Missing data in these populations manifest both as missing markers and markers with incomplete allele recovery. False homozygous calls at heterozygous sites resulting from incomplete allele recovery confound many existing imputation algorithms. These types of systematic errors can be minimized by incorporating depth-of-sequencing read coverage into the imputation algorithm. Accordingly, we developed Low-Coverage Biallelic Impute (LB-Impute) to resolve missing data issues. LB-Impute uses a hidden Markov model that incorporates marker read coverage to determine variable emission probabilities. Robust, highly accurate imputation results were reliably obtained with LB-Impute, even at extremely low (<1×) average per-marker coverage. This finding will have implications for the design of genotype imputation algorithms in the future. LB-Impute is publicly available on GitHub at https://github.com/dellaporta-laboratory/LB-Impute.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.