Abstract

In clinical genetic testing, checking the concordance between self-reported gender and genotype-inferred gender from genomic data is a significant quality control measure because mismatched gender due to sex chromosomal abnormalities or misregistration of clinical information can significantly affect molecular diagnosis and treatment decisions. Targeted gene sequencing (TGS) is widely recommended as a first-tier diagnostic step in clinical genetic testing. However, the existing gender-inference tools are optimized for whole genome and whole exome data and are not adequate and accurate for analyzing TGS data. In this study, we validated a new gender-inference tool, seGMM, which uses unsupervised clustering (Gaussian mixture model) to determine the gender of a sample. The seGMM tool can also identify sex chromosomal abnormalities in samples by aligning the sequencing reads from the genotype data. The seGMM tool consistently demonstrated >99% gender-inference accuracy in a publicly available 1,000-gene panel dataset from the 1,000 Genomes project, an in-house 785 hearing loss gene panel dataset of 16,387 samples, and a 187 autism risk gene panel dataset from the Autism Clinical and Genetic Resources in China (ACGC) database. The performance and accuracy of seGMM was significantly higher for the targeted gene sequencing (TGS), whole exome sequencing (WES), and whole genome sequencing (WGS) datasets compared to the other existing gender-inference tools such as PLINK, seXY, and XYalign. The results of seGMM were confirmed by the short tandem repeat analysis of the sex chromosome marker gene, amelogenin. Furthermore, our data showed that seGMM accurately identified sex chromosomal abnormalities in the samples. In conclusion, the seGMM tool shows great potential in clinical genetics by determining the sex chromosomal karyotypes of samples from massively parallel sequencing data with high accuracy.

Highlights

  • The next-generation sequencing (NGS) technology has revolutionized human biology and medicine in the last decade

  • The PLINK tool analysis showed that the F coefficients for the dataset1 samples ranged from 0 to 0.9 and gap of F coefficients was not observed (Supplementary Figure S1)

  • XYalign plot showed that one female sample was located along with the male samples and three female samples were located between the two ellipses

Read more

Summary

Introduction

The next-generation sequencing (NGS) technology has revolutionized human biology and medicine in the last decade. NGS is routinely used in clinical genetic testing for molecular diagnosis of hereditary disorders, infectious diseases, and immune disorders, non-invasive prenatal genetic testing, and personalized precision medicine, especially for cancer patients (Phillips and Douglas, 2018; Phillips et al, 2020). Clinical genetic testing is a diagnostic tool that involves genome sequencing to identify pathogenic gene mutations (genetic variants) in human diseases (McPherson, 2006). This may involve targeted gene sequencing (TGS) of single or multiple genes, whole exome sequencing (WES), or whole genome sequencing (WGS) (Di Resta et al, 2018). TGS has been used for the diagnosis of several human diseases including hearing loss, vision loss, cardiovascular disorders, neurologic disorders, cancer risk, and renal disorders (Lin et al, 2012; Saudi Mendeliome, 2015)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call