GBScleanR: Robust genotyping error correction using a hidden Markov model with error pattern recognition.

Tomoyuki Furuta,Toshio Yamamoto,Motoyuki Ashikari,J Endelman

doi:10.1093/genetics/iyad055

Tomoyuki Furuta, Toshio Yamamoto + Show 2 more

Open Access

https://doi.org/10.1093/genetics/iyad055

Copy DOI

Journal: Genetics	Publication Date: Mar 29, 2023
Citations: 4	License type: CC BY 4.0

Affiliation: Okayama University, Nagoya University

Abstract

Reduced-representation sequencing (RRS) provides cost-effective and time-saving genotyping platforms. Despite the outstanding advantage of RRS in throughput, the obtained genotype data usually contain a large number of errors. Several error correction methods employing the hidden Markov model (HMM) have been developed to overcome these issues. These methods assume that markers have a uniform error rate with no bias in the allele read ratio. However, bias does occur because of uneven amplification of genomic fragments and read mismapping. In this paper, we introduce an error correction tool, GBScleanR, which enables robust and precise error correction for noisy RRS-based genotype data by incorporating marker-specific error rates into the HMM. The results indicate that GBScleanR improves the accuracy by more than 25 percentage points at maximum compared to the existing tools in simulation datasets and achieves the most reliable genotype estimation in real data even with error-prone markers.

Full Text