Abstract

AbstractBackgroundThe plasma proteome provides information to identify dysregulated molecular pathways underlying disease. Cerebrovascular diseases often manifest a significantly different plasma proteomic signature. Cerebral‐Autosomal‐Dominant‐Arteriopathy‐Subcortical‐Infarcts‐and‐Leukoencephalopathy (CADASIL) is one such vascular brain disease. The CADASIL plasma proteome has yet to be investigated.MethodTo investigate the plasma proteome in the context of CADASIL, we used large‐scale proteomics, measuring over 7,000 proteins via aptamer‐based technology (SomaLogic) from 53 study participants (nCADASIL = 25; nControl = 28). We employed machine learning (ML) methods as an unbiased approach to uncover disease‐associated change in proteomic networks. This approach is based on the premise that protein sets that best classify disease state could be important biological drivers of disease. We developed a novel ML method: coupling recursive feature extraction with logistic regression (LR) and XGBoost evaluators, as well maximum‐relevance‐minimum‐redundancy with Random‐Forest and F‐Statistic evaluators. The results of all four models were selectively aggregated in order to delineate the CADASIL plasma proteome signature while minimizing overfitting to the low sample size data. We developed a 45 protein model of the CADASIL proteomic signature from these results. To evaluate the classification ability, we tested the Repeated‐Stratified‐10‐fold‐Cross‐Validation accuracy of a LR classifier. Next, we investigated whether the CADASIL proteomic signature was relevant for other neurodegenerative diseases with vascular components; the classifier was applied to an Alzheimer’s Disease (AD) dataset. Finally, the functional signature of the 45 proteins was investigated through gene ontology (GO).ResultThe LR classifier in CADASIL is 99.8% accurate. In AD, the accuracy is 71.2%. This implies the CADASIL signature shares commonalities as well as pertinent differences with other neurodegenerative diseases. The GO of the 45 proteins resulted in GO terms corresponding to extracellular matrix and collagen, cellular membrane lipoproteins and glycolipids, and proteasome biological pathways.ConclusionThe proposed method allows for an initial unbiased discovery of important proteomic changes associated with disease state. The identified proteins would provide starting points for mechanistic studies. In addition, a panel of proteins could be used for screening at population level for risk stratification and more in‐depth phenotyping. Next steps include investigating the biological relevance of findings in mechanistic models for therapeutic developments.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call