Biological Datasets Research Articles

SummaryBackgroundArterial hypertension is a major cardiovascular risk factor. Identification of secondary hypertension in its various forms is key to preventing and targeting treatment of cardiovascular complications. Simplified diagnostic tests are urgently required to distinguish primary and secondary hypertension to address the current underdiagnosis of the latter.MethodsThis study uses Machine Learning (ML) to classify subtypes of endocrine hypertension (EHT) in a large cohort of hypertensive patients using multidimensional omics analysis of plasma and urine samples. We measured 409 multi-omics (MOmics) features including plasma miRNAs (PmiRNA: 173), plasma catechol O-methylated metabolites (PMetas: 4), plasma steroids (PSteroids: 16), urinary steroid metabolites (USteroids: 27), and plasma small metabolites (PSmallMB: 189) in primary hypertension (PHT) patients, EHT patients with either primary aldosteronism (PA), pheochromocytoma/functional paraganglioma (PPGL) or Cushing syndrome (CS) and normotensive volunteers (NV). Biomarker discovery involved selection of disease combination, outlier handling, feature reduction, 8 ML classifiers, class balancing and consideration of different age- and sex-based scenarios. Classifications were evaluated using balanced accuracy, sensitivity, specificity, AUC, F1, and Kappa score.FindingsComplete clinical and biological datasets were generated from 307 subjects (PA=113, PPGL=88, CS=41 and PHT=112). The random forest classifier provided ∼92% balanced accuracy (∼11% improvement on the best mono-omics classifier), with 96% specificity and 0.95 AUC to distinguish one of the four conditions in multi-class ALL-ALL comparisons (PPGL vs PA vs CS vs PHT) on an unseen test set, using 57 MOmics features. For discrimination of EHT (PA + PPGL + CS) vs PHT, the simple logistic classifier achieved 0.96 AUC with 90% sensitivity, and ∼86% specificity, using 37 MOmics features. One PmiRNA (hsa-miR-15a-5p) and two PSmallMB (C9 and PC ae C38:1) features were found to be most discriminating for all disease combinations. Overall, the MOmics-based classifiers were able to provide better classification performance in comparison to mono-omics classifiers.InterpretationWe have developed a ML pipeline to distinguish different EHT subtypes from PHT using multi-omics data. This innovative approach to stratification is an advancement towards the development of a diagnostic tool for EHT patients, significantly increasing testing throughput and accelerating administration of appropriate treatment.FundingEuropean Union's Horizon 2020 Research and Innovation Programme under Grant Agreement No. 633983, Clinical Research Priority Program of the University of Zurich for the CRPP HYRENE (to Z.E. and F.B.), and Deutsche Forschungsgemeinschaft (CRC/Transregio 205/1).

Read full abstract

Poster session 3, September 23, 2022, 12:30 PM - 1:30 PMObjective: To identify and produce novel biomarkers with potential use for the specific diagnosis of H. capsulatum infection.MethodsHere, we design a novel strategy to search and select new Candidate genes for biomarkers that integrates the use of a computational analysis model that includes the application of bioinformatic tools such as OrthoMCL, BLASTp, TargetP, and SignalP, applied on a local collection of proteome database obtained manually from GenBank-NCBI, and the analysis of previously published biological and experimental data sets, including a secreted proteome database obtained from pathogenic yeast-phase H. capsulatum culture filtrates, a Histoplasma yeast and mycelial transcriptomes database, and a urine-peptides database from Histoplasma-immunoassay-positive patients.For the synthesis of the Candidates, an internal protocol for the production of recombinant proteins in prokaryotic and eukaryotic systems was applied. Obtaining polyclonal antibodies (PAb) specific for each biomarker was carried out by adapting a rapid immunization protocol for BALB/c mice.Finally, the computational model was experimentally validated, evaluating the reactivity and specificity of PAb anti-Histoplasma with fungus culture extracts and samples from patients with histoplasmosis.ResultsUsing the computational analysis model, 2 Candidate genes for diagnostic biomarkers were identified. Subsequently, the construction of expression vector for each Candidate and the production of these genes were achieved using a standardized protocol for the production of recombinant proteins.Polyclonal antibodies (PAb) anti-histoplasma were obtained and shown to be reactive against purified H. capsulatum-antigens. Finally, we confirmed the presence of these antigens in yeast culture extracts of H. capsulatum and demonstrated the immunoreactivity of anti-Histoplasma PAb with urine samples from patients previously diagnosed with histoplasmosis.ConclusionThe generation of novel strategies that combine data analysis, computational tools, and transcriptomic and proteomic techniques could be very useful for the identification of new biomarker genes and the development of microbiological diagnostic tests for important pathogens.

Read full abstract

Biological Datasets Research Articles

Related Topics

Articles published on Biological Datasets

LearnMSA: learning and aligning large protein families.

Mapping the impacts of multiple stressors on the decline in kelps along the coast of Victoria, Australia

Computational Prediction of Drug-Disease Association Based on Graph-Regularized One Bit Matrix Completion.

Predicting miRNA-Disease Association Based on Improved Graph Regression.

Integrating research infrastructures into infectious diseases surveillance operations: Focus on biobanks

Interpretable machine learning methods for predictions in systems biology from omics data.

A Machine Learning-Based Approach Using Multi-omics Data to Predict Metabolic Pathways.

Omics Data and Data Representations for Deep Learning-Based Predictive Modeling.

Obtaining genetics insights from deep learning via explainable artificial intelligence.

Machine learning for classification of hypertension subtypes using multi-omics: A multi-centre, retrospective, data-driven study.

Distributed Big Data Storage Infrastructure for Biomedical Research Featuring High-Performance and Rich-Features

P440 General protocol applied to the identification and production of new biomarkers with potential use for the diagnosis of histoplasmosis

Tracking mutational semantics of SARS-CoV-2 genomes

RNA velocity unraveled.

A model and cooperative co-evolution algorithm for identifying driver pathways based on the integrated data and PPI network

Analysis of the Contribution of Intrinsic Disorder in Shaping Potyvirus Genetic Diversity.

A Multimodal Framework for Improving in Silico Drug Repositioning With the Prior Knowledge From Knowledge Graphs.

Competence of medicinal plant database using data mining algorithms for large biological databases

The diagnostic potential and barriers of microbiome based therapeutics.

Information Theory as an Experimental Tool for Integrating Disparate Biophysical Signaling Modules.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Biological Datasets Research Articles

Related Topics

Articles published on Biological Datasets

LearnMSA: learning and aligning large protein families.

Mapping the impacts of multiple stressors on the decline in kelps along the coast of Victoria, Australia

Computational Prediction of Drug-Disease Association Based on Graph-Regularized One Bit Matrix Completion.

Predicting miRNA-Disease Association Based on Improved Graph Regression.

Integrating research infrastructures into infectious diseases surveillance operations: Focus on biobanks

Interpretable machine learning methods for predictions in systems biology from omics data.

A Machine Learning-Based Approach Using Multi-omics Data to Predict Metabolic Pathways.

Omics Data and Data Representations for Deep Learning-Based Predictive Modeling.

Obtaining genetics insights from deep learning via explainable artificial intelligence.

Machine learning for classification of hypertension subtypes using multi-omics: A multi-centre, retrospective, data-driven study.

Distributed Big Data Storage Infrastructure for Biomedical Research Featuring High-Performance and Rich-Features

P440 General protocol applied to the identification and production of new biomarkers with potential use for the diagnosis of histoplasmosis

Tracking mutational semantics of SARS-CoV-2 genomes

RNA velocity unraveled.

A model and cooperative co-evolution algorithm for identifying driver pathways based on the integrated data and PPI network

Analysis of the Contribution of Intrinsic Disorder in Shaping Potyvirus Genetic Diversity.

A Multimodal Framework for Improving in Silico Drug Repositioning With the Prior Knowledge From Knowledge Graphs.

Competence of medicinal plant database using data mining algorithms for large biological databases

The diagnostic potential and barriers of microbiome based therapeutics.

Information Theory as an Experimental Tool for Integrating Disparate Biophysical Signaling Modules.