Despite vast data support in DNA methylation (DNAm) biomarker discovery to facilitate health-care research, this field faces huge resource barriers due to preliminary unreliable candidates and the consequent compensations using expensive experiments. The underlying challenges lie in the confounding factors, especially measurement noise and individual characteristics. To achieve reliable identification of a candidate pool for DNAm biomarker discovery, we propose a Causality-driven Deep Regularization framework to reinforce correlations that are suggestive of causality with disease. It integrates causal thinking, deep learning, and biological priors to handle non-causal confounding factors, through a contrastive scheme and a spatial-relation regularization that reduces interferences from individual characteristics and noises, respectively. The comprehensive reliability of the proposed method was verified by simulations and applications involving various human diseases, sample origins, and sequencing technologies, highlighting its universal biomedical significance. Overall, this study offers a causal-deep-learning-based perspective with a compatible tool to identify reliable DNAm biomarker candidates, promoting resource-efficient biomarker discovery.
Read full abstract