Abstract
BackgroundCord blood is a commonly used tissue in environmental, genetic, and epigenetic population studies due to its ready availability and potential to inform on a sensitive period of human development. However, the introduction of maternal blood during labor or cross-contamination during sample collection may complicate downstream analyses. After discovering maternal contamination of cord blood in a cohort study of 150 neonates using Illumina 450K DNA methylation (DNAm) data, we used a combination of linear regression and random forest machine learning to create a DNAm-based screening method. We identified a panel of DNAm sites that could discriminate between contaminated and non-contaminated samples, then designed pyrosequencing assays to pre-screen DNA prior to being assayed on an array.ResultsMaternal contamination of cord blood was initially identified by unusual X chromosome DNA methylation patterns in 17 males. We utilized our DNAm panel to detect contaminated male samples and a proportional amount of female samples in the same cohort. We validated our DNAm screening method on an additional 189 sample cohort using both pyrosequencing and DNAm arrays, as well as 9 publically available cord blood 450K data sets. The rate of contamination varied from 0 to 10% within these studies, likely related to collection specific methods.ConclusionsMaternal blood can contaminate cord blood during sample collection at appreciable levels across multiple studies. We have identified a panel of markers that can be used to identify this contamination, either post hoc after DNAm arrays have been completed, or in advance using a targeted technique like pyrosequencing.
Highlights
Cord blood is a commonly used tissue in environmental, genetic, and epigenetic population studies due to its ready availability and potential to inform on a sensitive period of human development
Detection of maternal contamination Our first indication of potential maternal contamination of cord blood came from unusual patterns in the DNA methylation (DNAm) data during quality control
We divided samples into three groups based on principal component 2 (PC2) of the full data and DNAm at cg05533223 on the X chromosome
Summary
Cord blood is a commonly used tissue in environmental, genetic, and epigenetic population studies due to its ready availability and potential to inform on a sensitive period of human development. Interest in the developmental origins of health and disease has made cord blood a popular choice for genetic, epigenetic, and environmental studies [1]. Morin et al Clinical Epigenetics (2017) 9:75 occur relatively frequently, estimated at 2–20% of collected samples, but it makes up a very small fraction of fetal blood, with ~10−4 to 10−5 fetal nucleated cells estimated as maternal [7,8,9,10]. This small amount of contamination should have negligible effects on the assessment of DNA or RNA. Neither technique is universally unambiguous, as mother/child pairs may not be informative for targeted genetic variants, and FISH or TaqMan analysis can only be performed on male children, as they differentiate XX maternal cells from XY child cells [5, 7,8,9, 11, 12]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.