Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

Predicting disease-related phenotypes using an integrated phenotype similarity measurement based on HPO

  • Abstract
  • Highlights & Summary
  • PDF
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

BackgroundImproving efficiency of disease diagnosis based on phenotype ontology is a critical yet challenging research area. Recently, Human Phenotype Ontology (HPO)-based semantic similarity has been affectively and widely used to identify causative genes and diseases. However, current phenotype similarity measurements just consider the annotations and hierarchy structure of HPO, neglecting the definition description of phenotype terms.ResultsIn this paper, we propose a novel phenotype similarity measurement, termed as DisPheno, which adequately incorporates the definition of phenotype terms in addition to HPO structure and annotations to measure the similarity between phenotype terms. DisPheno also integrates phenotype term associations into phenotype-set similarity measurement using gene and disease annotations of phenotype terms.ConclusionsCompared with five existing state-of-the-art methods, DisPheno shows great performance in HPO-based phenotype semantic similarity measurement and improves the efficiency of disease identification, especially on noisy patients dataset.

Similar Papers
  • Research Article
  • Cite Count Icon 37
  • 10.1016/j.jbi.2019.103246
HPO2Vec+: Leveraging heterogeneous knowledge resources to enrich node embeddings for the Human Phenotype Ontology.
  • Jun 27, 2019
  • Journal of Biomedical Informatics
  • Feichen Shen + 7 more

HPO2Vec+: Leveraging heterogeneous knowledge resources to enrich node embeddings for the Human Phenotype Ontology.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 346
  • 10.1093/nar/gkad1005
The Human Phenotype Ontology in 2024: phenotypes around the world
  • Nov 11, 2023
  • Nucleic Acids Research
  • Michaela Zelinová + 99 more

The Human Phenotype Ontology (HPO) is a widely used resource that comprehensively organizes and defines the phenotypic features of human disease, enabling computational inference and supporting genomic and phenotypic analyses through semantic similarity and machine learning algorithms. The HPO has widespread applications in clinical diagnostics and translational research, including genomic diagnostics, gene-disease discovery, and cohort analytics. In recent years, groups around the world have developed translations of the HPO from English to other languages, and the HPO browser has been internationalized, allowing users to view HPO term labels and in many cases synonyms and definitions in ten languages in addition to English. Since our last report, a total of 2239 new HPO terms and 49235 new HPO annotations were developed, many in collaboration with external groups in the fields of psychiatry, arthrogryposis, immunology and cardiology. The Medical Action Ontology (MAxO) is a new effort to model treatments and other measures taken for clinical management. Finally, the HPO consortium is contributing to efforts to integrate the HPO and the GA4GH Phenopacket Schema into electronic health records (EHRs) with the goal of more standardized and computable integration of rare disease data in EHRs.

  • PDF Download Icon
  • Preprint Article
  • Cite Count Icon 1
  • 10.7287/peerj.preprints.2259v1
Using ontologies and semantic similarity measures for prioritization of gene regulatory networks
  • Jul 6, 2016
  • Marianna Milano + 2 more

Omics sciences are widely used to analyze diseases at a molecular level. Usually, results of omics experiments are sets of candidate genes potentially involved in different diseases. The interpretation of results and the filtering of candidate genes or proteins selected in an experiment is a challenge in some scenarios. This problem is particularly evident in clinical environments in which researchers are interested in the behavior of few molecules related to some specific disease while results may contains thousands of data and have very relevant dimensions. The filtering requires the use of domain-specific knowledge that is usually encoded into ontologies. Consequently, to filter out false positive genes, different approaches for selecting genes have been introduced. Such approaches are often referred to as Gene prioritization methods. They aim to identify the most related genes to a disease among a larger set of candidates genes, through the use of computational methods. We implemented GoD (Gene ranking based On Diseases), an algorithm that ranks a given set of genes based on ontology annotations. The algorithm orders genes by the semantic similarity computed with respect to a disease among the annotations of each gene and those describing the selected disease.The current version of GoD enables the prioritization of a list of input genes for a selected disease. It uses HPO (Human Phenotype Ontology), GO (Gene Ontology), and DO (Disease Ontology) ontologies for the calculation of the ranking. It takes as input a list of genes or gene products annotated with GO Terms, HPO Terms, DO Terms and a selected disease described regarding annotation of GO, HPO or DO (user may also provide novel annotations). It produces as output the ranking of those genes with respect of the input disease. Package consists of three main functions: hpoGoD (for HPO based prioritization), goGoD (for GO based prioritization), and doGoD (for DO based prioritization). We tested GoD on Gene Regulatory Networks (GRNs). Biological network inference aims to reconstruct network of interactions (or associations) among biological genes starting from experimental observations. We selected three expression datasets: Dataset 1 (GDS3285) , related to breast cancer disease; Dataset 2 (GDS5072), related to prostate cancer disease; and Dataset 3 (GDS5093), related to Dengue virus (DENV) infection. Initially, experimental data are given as input to five GRN inference algorithms, i.e. ARACNE, CLR, MRNET, GENIE3 and GGM, to produce 5 inferred GRN networks. For each inferred GRN, GoD receives as input the list of top genes and produces for each gene a semantic similarity value on a selected disease considering one of the previous ontologies (e.g. Disease Ontology). For each GRN, the genes are ranked and reordered on the basis of the computed semantic similarity and are compared allowing to rank each GRN inference method with respect to the initially selected disease.

  • Book Chapter
  • Cite Count Icon 1
  • 10.1016/b978-0-12-809633-8.20405-6
Gene Prioritization Using Semantic Similarity
  • May 3, 2018
  • Reference Module in Life Sciences
  • Erinija Pranckevičienė

Gene Prioritization Using Semantic Similarity

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 79
  • 10.1371/journal.pone.0115692
HPOSim: an R package for phenotypic similarity measure and enrichment analysis based on the human phenotype ontology.
  • Feb 9, 2015
  • PLOS ONE
  • Yue Deng + 3 more

BackgroundPhenotypic features associated with genes and diseases play an important role in disease-related studies and most of the available methods focus solely on the Online Mendelian Inheritance in Man (OMIM) database without considering the controlled vocabulary. The Human Phenotype Ontology (HPO) provides a standardized and controlled vocabulary covering phenotypic abnormalities in human diseases, and becomes a comprehensive resource for computational analysis of human disease phenotypes. Most of the existing HPO-based software tools cannot be used offline and provide only few similarity measures. Therefore, there is a critical need for developing a comprehensive and offline software for phenotypic features similarity based on HPO.ResultsHPOSim is an R package for analyzing phenotypic similarity for genes and diseases based on HPO data. Seven commonly used semantic similarity measures are implemented in HPOSim. Enrichment analysis of gene sets and disease sets are also implemented, including hypergeometric enrichment analysis and network ontology analysis (NOA).ConclusionsHPOSim can be used to predict disease genes and explore disease-related function of gene modules. HPOSim is open source and freely available at SourceForge (https://sourceforge.net/p/hposim/).

  • Research Article
  • Cite Count Icon 5
  • 10.1016/j.procs.2020.03.421
Network based approach for discovering novel gene-phenotypic association and disease co morbidities using ontological data
  • Jan 1, 2020
  • Procedia Computer Science
  • K.S Lakshmi + 1 more

Network based approach for discovering novel gene-phenotypic association and disease co morbidities using ontological data

  • Research Article
  • Cite Count Icon 1
  • 10.1186/s44342-024-00024-1
Examining HPO by organ and system to facilitate practical use by clinicians
  • Nov 12, 2024
  • Genomics & Informatics
  • Eisuke Dohi + 4 more

The Human Phenotype Ontology (HPO) is widely used for annotating clinical text data, and sufficient annotation is crucial for the effective utilization of clinical texts. It was known that the use of LLMs can successfully extract symptoms and findings, but cannot annotate them with the HPO. We hypothesized that one of the potential issue for this is the lack of appropriate terms in the HPO. Therefore, during the Biomedical Linked Annotation Hackathon 8 (BLAH8), we attempted the following two tasks in order to grasp the overall picture of HPO. (1) Extract all HPO terms for each of the 23 HPO subclasses (defined as categories) directly under the HPO "Phenotypic abnormality" and then (2) search for major attributes in each of 23 categories. We employed LLM for these two tasks related to examining HPO and, at the same time, found that LLM didn't work well without ingenuity for tasks that lacked sentences and context. A manual search for terms within each category revealed that the HPO contains a mix of terms with four major attributes: (1) Disease Name, (2) Condition, (3) Test Data, and (4) Symptoms and Findings. Manual curation showed that the ratio of symptoms and findings varied from 0 to 93.1% across categories. For clinicians, who are end-users of medical terminology including HPO, it is difficult to understand ontologies. However, for good quality ontology is also important for good-quality data, and a clinician’s help is essential. It is also important to make the overall picture and limitations of ontologies easy to understand in order to bring out the explanatory power of LLMs and artificial intelligence.

  • Research Article
  • 10.1093/ndt/gfad063c_4037
#4037 DEVELOPMENT OF AN EXPERT MODEL FOR THE DIAGNOSIS OF INHERITED KIDNEY DISEASES
  • Jun 14, 2023
  • Nephrology Dialysis Transplantation
  • Leonor Fayos De Arizón + 8 more

Background and Aims Inherited kidney diseases (IKDs) account for 10-20% of cases of Chronic Kidney Disease (CKD). Mutations in over 400 genes are known as drivers of more than 150 monogenic kidney disorders. Clinical diagnosis is often challenging as these disorders are characterized by numerous clinical features often shared by many diseases as well as very variable expression. Over the years, rare-disease databases such as OMIM and Orphanet have complied information to facilitate diagnosis of rare diseases. Based on these, The Human Phenotype Ontology (HPO) developed a standardized directory for phenotypes with phenotype-disease associations. New phenotype-driven gene-priorization tools such as Phenomizer have been developed in the recent years using these resources, however structural inaccuracies prevent a proper uptake of these tools. To date, no specific renal tool has been developed. The aim of the study was to develop an expert model for the diagnosis of IKDs based on phenotype semantic similarity distances such over Human Phenotype Annotations. Method Using MONDO ontology which encompasses HPO phenotypes, OMIM and Orphanet, HPO annotations corresponding to 196 IKDs were exported an analyzed. Based on literature, original HPO annotations for each IKD were manually curated (inaccurate terms were eliminated, new known HPOs were added and new annotations were generated in case they were missing). Frequencies were added to each annotation. Specific kidney data such as CKD stage, age at need of kidney replacement therapy and degree of proteinuria were included. Age of onset, prevalence, and specific gender sub analysis in case of diseases with X-linked inheritance was also integrated. All curated phenotype-disease associations were reviewed by European experts. Using MONDO annotations for each IKD as synthetic patients a statistical model with a friendly user web-based interface was developed in Java. Results Analysis of current ontologies by experts in the filed revealed that HPO terms associated to IKDs were inaccurate and non-specific. A new phenotype-driven tool fed with new curated terms and specific kidney data was found to be superior to current tools. Future contributions to ontologies such as HPO, MONDO and open data-sources with curated terms and updated classifications concerning IKDs will help to improving the diagnosis and characterization of IKDs. Conclusion Current phenotype-based gene prioritization tools and rare disease databases have contributed significantly to improving the clinical diagnosis of rare diseases. However, these tools are nonspecific and imprecise when focusing on a specific field such as IKDs. Curation of current annotations by experts in the field will greatly improve the accuracy of these tools and facilitate clinical diagnosis of IKDs.

  • Research Article
  • Cite Count Icon 41
  • 10.1093/bioinformatics/btw649
Transfer learning across ontologies for phenome-genome association prediction.
  • Nov 23, 2016
  • Bioinformatics
  • Raphael Petegrosso + 3 more

To better predict and analyze gene associations with the collection of phenotypes organized in a phenotype ontology, it is crucial to effectively model the hierarchical structure among the phenotypes in the ontology and leverage the sparse known associations with additional training information. In this paper, we first introduce Dual Label Propagation (DLP) to impose consistent associations with the entire phenotype paths in predicting phenotype-gene associations in Human Phenotype Ontology (HPO). DLP is then used as the base model in a transfer learning framework (tlDLP) to incorporate functional annotations in Gene Ontology (GO). By simultaneously reconstructing GO term-gene associations and HPO phenotype-gene associations for all the genes in a protein-protein interaction network, tlDLP benefits from the enriched training associations indirectly through relation with GO terms. In the experiments to predict the associations between human genes and phenotypes in HPO based on human protein-protein interaction network, both DLP and tlDLP improved the prediction of gene associations with phenotype paths in HPO in cross-validation and the prediction of the most recent associations added after the snapshot of the training data. Moreover, the transfer learning through GO term-gene associations significantly improved association predictions for the phenotypes with no more specific known associations by a large margin. Examples are also shown to demonstrate how phenotype paths in phenotype ontology and transfer learning with gene ontology can improve the predictions. Source code is available at http://compbio.cs.umn.edu/ontophenome . kuang@cs.umn.com. Supplementary data are available at Bioinformatics online.

  • Research Article
  • Cite Count Icon 3
  • 10.3389/frai.2025.1496937
Population health management through human phenotype ontology with policy for ecosystem improvement.
  • Aug 1, 2025
  • Frontiers in artificial intelligence
  • James Andrew Henry

The manuscript "Population Health Management (PHM) Human Phenotype Ontology (HPO) Policy for Ecosystem Improvement" steward safe science and secure technology in medical reform. The digital HPO policy advances Biological Modelling (BM) capacity and capability in a series of fair classifications. Public trust in the PHM of HPO is a vision of public health and patient safety, with a primary goal of socioeconomic success sustained by citizen privacy and trust within an ecosystem of predictor equality and intercept parity. Science and technology security evaluation, resource allocation, and appropriate regulation are essential for establishing a solid foundation in a safe ecosystem. The AI Security Institute collaborates with higher experts to assess BM cybersecurity and privacy. Within this ecosystem, resources are allocated to the Genomic Medical Sciences Cluster and AI metrics that support safe HPO transformations. These efforts ensure that AI digital regulation acts as a service appropriate to steward progressive PHM. The manuscript presents a five-point mission for the effective management of population health. A comprehensive national policy for phenotype ontology with Higher Expert Medical Science Safety stewards reform across sectors. It emphasizes developing genomic predictors and intercepts, authorizing predictive health pre-eXams and precise care eXams, adopting Generative Artificial Intelligence classifications, and expanding the PHM ecosystem in benchmark reforms. Discussions explore medical reform focusing on public health and patient safety. The nation's safe space expansions with continual improvements include steward developing, authorizing, and adopting digital BM twins. The manuscript addresses international classifications where the global development of PHM enables nations to choose what to authorize for BM points of need. These efforts promote channels for adopting HPO uniformity, transforming research findings into routine phenotypical primary care practices. This manuscript charts the UK's and global PHM's ecosystem expansion, designing HPO policies that steward the modeling of biology in personal classifications. It develops secure, safe, fair, and explainable BM for public trust in authorized classifiers and promotes informed choices regarding what nations and individuals adopt in a cooperative PHM progression. Championing equitable classifications in a robust ecosystem sustains advancements in population health outcomes for economic growth and public health betterment.

  • Research Article
  • Cite Count Icon 8
  • 10.1007/s00439-022-02449-6
Predicting genes from phenotypes using human phenotype ontology (HPO) terms.
  • Mar 31, 2022
  • Human Genetics
  • Anne Slavotinek + 5 more

The interpretation of genomic variants following whole exome sequencing (WES) can be aided using human phenotype ontology (HPO) terms to standardize clinical features and predict causative genes. We performed WES on 453 patients diagnosed prior to 18years of age and identified 114 pathogenic (P) or likely pathogenic (LP) variants in 112 patients. We utilized PhenoDB to extract HPO terms from provider notes and then used Phen2Gene to generate a gene score and gene ranking from each list of HPO terms. We assigned Phen2Gene gene rankings to 6 rank classes, with class 1 covering raw gene rankings of 1 to 10 and class 2 covering rankings from 11 to 50 out of a total of 17,126 possible gene rankings. Phen2Gene ranked causative genes into rank class 1 or 2 in 27.7% of cases and the genes in rank class 1 were all associated with well-characterized phenotypes. We found significant associations between the gene score and the number of years, since the gene was first published, the number of HPO terms with an hierarchical depth greater or equal to 11, and the number of Online Mendelian Inheritance in Man terms associated with the phenotype and gene. We conclude that genes associated with recognizable phenotypes and terms deep in the HPO hierarchy have the best chance of producing a high gene score and ranking in class 1 to 2 using Phen2Gene software with HPO terms. Clinicians and laboratory staff should consider these results when HPO terms are employed to prioritize candidate genes.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 13
  • 10.1186/s12864-018-4927-z
An online tool for measuring and visualizing phenotype similarities using HPO
  • Aug 1, 2018
  • BMC Genomics
  • Jiajie Peng + 7 more

BackgroundThe Human Phenotype Ontology (HPO) is one of the most popular bioinformatics resources. Recently, HPO-based phenotype semantic similarity has been effectively applied to model patient phenotype data. However, the existing tools are revised based on the Gene Ontology (GO)-based term similarity. The design of the models are not optimized for the unique features of HPO. In addition, existing tools only allow HPO terms as input and only provide pure text-based outputs.ResultsWe present PhenoSimWeb, a web application that allows researchers to measure HPO-based phenotype semantic similarities using four approaches borrowed from GO-based similarity measurements. Besides, we provide a approach considering the unique properties of HPO. And, PhenoSimWeb allows text that describes phenotypes as input, since clinical phenotype data is always in text. PhenoSimWeb also provides a graphic visualization interface to visualize the resulting phenotype network.ConclusionsPhenoSimWeb is an easy-to-use and functional online application. Researchers can use it to calculate phenotype similarity conveniently, predict phenotype associated genes or diseases, and visualize the network of phenotype interactions. PhenoSimWeb is available at http://120.77.47.2:8080.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 59
  • 10.1186/1471-2105-15-248
Clinical phenotype-based gene prioritization: an initial study using semantic similarity and the human phenotype ontology
  • Jul 21, 2014
  • BMC Bioinformatics
  • Aaron J Masino + 8 more

BackgroundExome sequencing is a promising method for diagnosing patients with a complex phenotype. However, variant interpretation relative to patient phenotype can be challenging in some scenarios, particularly clinical assessment of rare complex phenotypes. Each patient’s sequence reveals many possibly damaging variants that must be individually assessed to establish clear association with patient phenotype. To assist interpretation, we implemented an algorithm that ranks a given set of genes relative to patient phenotype. The algorithm orders genes by the semantic similarity computed between phenotypic descriptors associated with each gene and those describing the patient. Phenotypic descriptor terms are taken from the Human Phenotype Ontology (HPO) and semantic similarity is derived from each term’s information content.ResultsModel validation was performed via simulation and with clinical data. We simulated 33 Mendelian diseases with 100 patients per disease. We modeled clinical conditions by adding noise and imprecision, i.e. phenotypic terms unrelated to the disease and terms less specific than the actual disease terms. We ranked the causative gene against all 2488 HPO annotated genes. The median causative gene rank was 1 for the optimal and noise cases, 12 for the imprecision case, and 60 for the imprecision with noise case. Additionally, we examined a clinical cohort of subjects with hearing impairment. The disease gene median rank was 22. However, when also considering the patient’s exome data and filtering non-exomic and common variants, the median rank improved to 3.ConclusionsSemantic similarity can rank a causative gene highly within a gene list relative to patient phenotype characteristics, provided that imprecision is mitigated. The clinical case results suggest that phenotype rank combined with variant analysis provides significant improvement over the individual approaches. We expect that this combined prioritization approach may increase accuracy and decrease effort for clinical genetic diagnosis.Electronic supplementary materialThe online version of this article (doi:10.1186/1471-2105-15-248) contains supplementary material, which is available to authorized users.

  • Preprint Article
  • 10.7490/f1000research.1111752.1
Enhancing the human phenotype ontology for use by the layperson
  • Apr 21, 2016
  • F1000Research
  • Nicole Vasilevsky + 6 more

Many diseases present with distinct phenotypes, making descriptions of phenotypes valuable for identifying and diagnosing human diseases. The Human Phenotype Ontology (HPO) was developed to provide a structured vocabulary containing textual and logical descriptions of human phenotypes. The HPO is used for phenotype-genotype alignment in systems like the Monarch Initiative to provide disorder prediction, variant prioritization, and patient matching between known diseases and model organisms. Here we describe recent work to extend the utility of the HPO through the systematic addition of approximately 6,000 synonyms. Until now, most of the HPO synonyms were composed of clinical terms unfamiliar to patients. For example, a patient may know they are ‘color-blind’, but may not be familiar with its official phenotype term ‘Dyschromatopsia’. Therefore, our goals is to add synonyms in “layperson-ese” so that HPO can be used by patients as well as basic research scientists and clinicians to help improve disease characterization and diagnosis. We systematically reviewed current HPO classes (approximately 12,000) and assigned layperson synonyms to each class where applicable. The layperson synonyms refer to colloquial terms used to describe phenotypic features associated with medical conditions. Each layperson synonym was annotated to indicate its special status, then classified as either exact (precise); broad (more general); narrow (more specific); or related (associated). The review process included various methods of identifying and validating possible layperson synonyms. We first queried the HPO to avoid duplicate terms. We then batched similar kinds of terms together, such as those related to bone abnormalities, to maintain consistent synonym terminology. For example, the phenotypes of the femur were assigned layperson synonym ‘of thigh bone’ and morphological abnormalities were described as ‘abnormal shape of ’. We consulted online resources (e.g., Wikipedia, Mayo Clinic) as well as specialized resources (e.g., Uberon, Gene Ontology) to find additional synonyms. As a quality control measure, we reviewed each other’s work, consulted with clinical experts when necessary, and queried Google for the assigned layperson term to verify that it retrieved the appropriate medical term and was in use. Some challenges of assigning layperson synonyms involved reconciling lay terms with the logic and structure of the HPO and determining the best mechanism to validate the lay synonyms. Additionally, not every term has a lay synonym or it may already exist in the HPO, such as ‘widow’s peak’ or ‘hitch hiker’s thumb’. Finally, some terms have complicated medical terminology, like ‘short distal phalanx of first finger’, for which a single layperson term is difficult to establish without using the definition of the term. The addition of layperson synonyms increases the usability of the HPO, making it useful for data interoperability across clinicians and patients. Additionally, this work will enable crowdsourcing by citizen scientists. The layperson synonyms are available in the latest release of the HPO.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 19
  • 10.1186/s12920-019-0625-1
HPOAnnotator: improving large-scale prediction of HPO annotations by low-rank approximation with HPO semantic similarities and multiple PPI networks
  • Dec 1, 2019
  • BMC Medical Genomics
  • Junning Gao + 5 more

BackgroundAs a standardized vocabulary of phenotypic abnormalities associated with human diseases, the Human Phenotype Ontology (HPO) has been widely used by researchers to annotate phenotypes of genes/proteins. For saving the cost and time spent on experiments, many computational approaches have been proposed. They are able to alleviate the problem to some extent, but their performances are still far from satisfactory.MethodFor inferring large-scale protein-phenotype associations, we propose HPOAnnotator that incorporates multiple Protein-Protein Interaction (PPI) information and the hierarchical structure of HPO. Specifically, we use a dual graph to regularize Non-negative Matrix Factorization (NMF) in a way that the information from different sources can be seamlessly integrated. In essence, HPOAnnotator solves the sparsity problem of a protein-phenotype association matrix by using a low-rank approximation.ResultsBy combining the hierarchical structure of HPO and co-annotations of proteins, our model can well capture the HPO semantic similarities. Moreover, graph Laplacian regularizations are imposed in the latent space so as to utilize multiple PPI networks. The performance of HPOAnnotator has been validated under cross-validation and independent test. Experimental results have shown that HPOAnnotator outperforms the competing methods significantly.ConclusionsThrough extensive comparisons with the state-of-the-art methods, we conclude that the proposed HPOAnnotator is able to achieve the superior performance as a result of using a low-rank approximation with a graph regularization. It is promising in that our approach can be considered as a starting point to study more efficient matrix factorization-based algorithms.

Save Icon
Up Arrow
Open/Close
Setting-up Chat
Loading Interface