Abstract

Detection of somatic mosaicism in non-proliferative cells is a new challenge in genome research, however, the accuracy of current detection strategies remains uncertain due to the lack of a ground truth. Herein, we sought to present a set of ultra-deep sequenced WES data based on reference standards generated by cell line mixtures, providing a total of 386,613 mosaic single-nucleotide variants (SNVs) and insertion-deletion mutations (INDELs) with variant allele frequencies (VAFs) ranging from 0.5% to 56%, as well as 35,113,417 non-variant and 19,936 germline variant sites as a negative control. The whole reference standard set mimics the cumulative aspect of mosaic variant acquisition such as in the early developmental stage owing to the progressive mixing of cell lines with established genotypes, ultimately unveiling 741 possible inter-sample relationships with respect to variant sharing and asymmetry in VAFs. We expect that our reference data will be essential for optimizing the current use of mosaic variant detection strategies and for developing algorithms to enable future improvements.

Highlights

  • Background & SummaryAfter conception, postzygotic mutations continuously occur throughout life in humans, causing somatic mosaicism in an individual[1,2]

  • Successful application to mosaicism has been obstructed by many challenges, such as low variant allele frequencies (VAF < 10%)[14,17,20,21] and ambiguity in the use of a control[14,17]

  • Unlike conventional somatic mutations, calling of mosaic variants is susceptible to two different types of errors: (1) calling non-variant sites and (2) calling germline variants, the latter of which is caused by the unreliability of controls

Read more

Summary

Background & Summary

Postzygotic mutations continuously occur throughout life in humans, causing somatic mosaicism in an individual[1,2]. We generated robust, large-scale, and cell line mixture-based reference standards using 386,613 single-nucleotide variants (SNVs) and insertion-deletion mutations (INDELs) as positive controls and 35,133,353 negative control positions. When MRC5 was employed as an internal reference, each of the five remaining cell lines (RPE, CCD-18co, HBEC30-KT, THLE-2, and FHC) had a unique set of variants among all, and were called V1 to V5, respectively (Fig. 1a; see Table 1 for the full list). Unlike conventional somatic mutations, calling of mosaic variants is susceptible to two different types of errors: (1) calling non-variant sites (e.g., reference allele) and (2) calling germline variants, the latter of which is caused by the unreliability of controls (e.g., variants shared in control samples). Our data constitute one of the most comprehensive, versatile, and robust reference standards ever constructed for variant analysis

Methods
Findings
Code availability
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.