Abstract
Environmentally induced epigenetic transgenerational inheritance of disease and phenotypic variation involves germline transmitted epimutations. The primary epimutations identified involve altered differential DNA methylation regions (DMRs). Different environmental toxicants have been shown to promote exposure (i.e., toxicant) specific signatures of germline epimutations. Analysis of genomic features associated with these epimutations identified low-density CpG regions (<3 CpG / 100bp) termed CpG deserts and a number of unique DNA sequence motifs. The rat genome was annotated for these and additional relevant features. The objective of the current study was to use a machine learning computational approach to predict all potential epimutations in the genome. A number of previously identified sperm epimutations were used as training sets. A novel machine learning approach using a sequential combination of Active Learning and Imbalance Class Learner analysis was developed. The transgenerational sperm epimutation analysis identified approximately 50K individual sites with a 1 kb mean size and 3,233 regions that had a minimum of three adjacent sites with a mean size of 3.5 kb. A select number of the most relevant genomic features were identified with the low density CpG deserts being a critical genomic feature of the features selected. A similar independent analysis with transgenerational somatic cell epimutation training sets identified a smaller number of 1,503 regions of genome-wide predicted sites and differences in genomic feature contributions. The predicted genome-wide germline (sperm) epimutations were found to be distinct from the predicted somatic cell epimutations. Validation of the genome-wide germline predicted sites used two recently identified transgenerational sperm epimutation signature sets from the pesticides dichlorodiphenyltrichloroethane (DDT) and methoxychlor (MXC) exposure lineage F3 generation. Analysis of this positive validation data set showed a 100% prediction accuracy for all the DDT-MXC sperm epimutations. Observations further elucidate the genomic features associated with transgenerational germline epimutations and identify a genome-wide set of potential epimutations that can be used to facilitate identification of epigenetic diagnostics for ancestral environmental exposures and disease susceptibility.
Highlights
Epigenetics is defined as “molecular factors and processes around DNA that regulate genome activity independent of DNA sequence and are mitotically stable” [1]
The machine learning approach used in this study (Fig 1A) uses the generalized query based Active learning (ACL) method to find the most important samples and features for the epigenetic datasets
ACL was run on the Dioxin-Hydrocarbons (Jet Fuel)Vinclozolin-Plastics-Pesticides (DHVPP) and SG datasets separately and only those features that appeared greater than 5 times, as well as some manually selected important features were chosen as the most relevant features for further analysis (Fig 2B & 2C)
Summary
Epigenetics is defined as “molecular factors and processes around DNA that regulate genome activity independent of DNA sequence and are mitotically stable” [1]. The molecular factors currently known to be epigenetic processes include DNA methylation, histone modifications, chromatin structure and selected non-coding RNA [1, 3,4,5,6,7]. A combination of epigenetic and genetic molecular mechanisms will be essential for most biological processes. Genetics has been the primary molecular component considered for most aspects of biology. DNA sequence and genetics has been considered the primary form of inheritance. Environmentally induced epigenetic transgenerational inheritance has been described in species from plants to humans [1]. This provides an additional epigenetic mechanism for inheritance to consider [9] and helps explain forms of familial inheritance not explained with classic genetics
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have