Identification of body fluid stain at crime scene is one of the important tasks of forensic evidence analysis. Currently, body fluid-specific CpGs detected by DNA methylation microarray screening, have been widely studied for forensic body fluid identification. However, some CpGs have limited ability to distinguish certain body fluid types. The ongoing need is to discover novel methylation markers and fully validate them to enhance their evidentiary strength in complex forensic scenarios. This research gathered forensic-related DNA methylation microarrays data from the Gene Expression Omnibus (GEO) database. A novel screening strategy for marker selection was developed, combining feature selection algorithms (elastic net, information gain ratio, feature importance based on Random Forest, and mutual information coefficient) with epigenetic pattern analysis, to identify CpG markers for body fluid identification. The selected CpGs were validated through pyrosequencing on peripheral blood, saliva, semen, vaginal secretions, and menstrual blood samples, and machine learning classification models were constructed based on the sequencing results. Pyrosequencing results revealed 14 CpGs with high specificity in five types of body fluid samples. A machine learning classification model, developed based on the pyrosequencing results, could effectively distinguish five types of body fluid samples, achieving 100 % accuracy on the test set. Utilizing six CpG markers, it was also feasible to attain ideal efficacy in identifying body fluid stains. Our research proposes a systematic and scientific strategy for screening body fluid-specific CpGs, contributing new insights and methods to forensic body fluid identification.
Read full abstract