Abstract
The ability to estimate the age of the donor from recovered biological material at a crime scene can be of substantial value in forensic investigations. Aging can be complex and is associated with various molecular modifications in cells that accumulate over a person’s lifetime including epigenetic patterns. The aim of this study was to use age-specific DNA methylation patterns to generate an accurate model for the prediction of chronological age using data from whole blood. In total, 45 age-associated CpG sites were selected based on their reported age coefficients in a previous extensive study and investigated using publicly available methylation data obtained from 1156 whole blood samples (aged 2–90 years) analysed with Illumina’s genome-wide methylation platforms (27K/450K). Applying stepwise regression for variable selection, 23 of these CpG sites were identified that could significantly contribute to age prediction modelling and multiple regression analysis carried out with these markers provided an accurate prediction of age (R2=0.92, mean absolute error (MAE)=4.6 years). However, applying machine learning, and more specifically a generalised regression neural network model, the age prediction significantly improved (R2=0.96) with a MAE=3.3 years for the training set and 4.4 years for a blind test set of 231 cases. The machine learning approach used 16 CpG sites, located in 16 different genomic regions, with the top 3 predictors of age belonged to the genes NHLRC1, SCGN and CSNK1D. The proposed model was further tested using independent cohorts of 53 monozygotic twins (MAE=7.1 years) and a cohort of 1011 disease state individuals (MAE=7.2 years). Furthermore, we highlighted the age markers’ potential applicability in samples other than blood by predicting age with similar accuracy in 265 saliva samples (R2=0.96) with a MAE=3.2 years (training set) and 4.0 years (blind test). In an attempt to create a sensitive and accurate age prediction test, a next generation sequencing (NGS)-based method able to quantify the methylation status of the selected 16 CpG sites was developed using the Illumina MiSeq® platform. The method was validated using DNA standards of known methylation levels and the age prediction accuracy has been initially assessed in a set of 46 whole blood samples. Although the resulted prediction accuracy using the NGS data was lower compared to the original model (MAE=7.5years), it is expected that future optimization of our strategy to account for technical variation as well as increasing the sample size will improve both the prediction accuracy and reproducibility.
Highlights
Body fluids such as blood are amongst the most important biological evidence recovered from crime scenes
In this study we address the question of whether a methylation assay based on benchtop next-generation sequencing (NGS) of a small number of CpG sites could provide focused 5-methylcytosine quantification with base resolution, and allow for a sensitive and less costly age prediction approach with similar accuracy to genome-wide DNA methylation profiling approaches that could be applied in a forensic setting
Our study contributes to a range of already published prediction models, by providing potential age-associated markers and by introducing a novel methodology in prediction analysis, namely machine learning by artificial neural network analysis
Summary
Body fluids such as blood are amongst the most important biological evidence recovered from crime scenes. 2 Present address: Department of Genetic Identification, Erasmus MC University. There have been various approaches to estimate age at death in human remains or chronological age in living individuals [5,6], most of these attempts show limitations including low sensitivity and prediction accuracy as well as lack of standardisation, restraining their applicability in crime scene samples. Developing an age prediction test is a major challenge for forensic scientists since they would need to be able to apply and validate it using minute or degraded samples consisting of a range of tissues and body fluids. The generation of reliable age prediction models is a necessity
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.