Knowledge Graphs
In this article, we provide a comprehensive introduction to knowledge graphs, which have recently garnered significant attention from both industry and academia in scenarios that require exploiting diverse, dynamic, large-scale collections of data. After some opening remarks, we motivate and contrast various graph-based data models, as well as languages used to query and validate knowledge graphs. We explain how knowledge can be represented and extracted using a combination of deductive and inductive techniques. We conclude with high-level future research directions for knowledge graphs.
- Research Article
4
- 10.3390/info14060336
- Jun 15, 2023
- Information
Knowledge graphs are graph-based data models which can represent real-time data that is constantly growing with the addition of new information. The question-answering systems over knowledge graphs (KGQA) retrieve answers to a natural language question from the knowledge graph. Most existing KGQA systems use static knowledge bases for offline training. After deployment, they fail to learn from unseen new entities added to the graph. There is a need for dynamic algorithms which can adapt to the evolving graphs and give interpretable results. In this research work, we propose using new auction algorithms for question answering over knowledge graphs. These algorithms can adapt to changing environments in real-time, making them suitable for offline and online training. An auction algorithm computes paths connecting an origin node to one or more destination nodes in a directed graph and uses node prices to guide the search for the path. The prices are initially assigned arbitrarily and updated dynamically based on defined rules. The algorithm navigates the graph from the high-price to the low-price nodes. When new nodes and edges are dynamically added or removed in an evolving knowledge graph, the algorithm can adapt by reusing the prices of existing nodes and assigning arbitrary prices to the new nodes. For subsequent related searches, the “learned” prices provide the means to “transfer knowledge” and act as a “guide”: to steer it toward the lower-priced nodes. Our approach reduces the search computational effort by 60% in our experiments, thus making the algorithm computationally efficient. The resulting path given by the algorithm can be mapped to the attributes of entities and relations in knowledge graphs to provide an explainable answer to the query. We discuss some applications for which our method can be used.
- Conference Article
3
- 10.1109/bigdata.2017.8257927
- Dec 1, 2017
Knowledge graphs are graph-based data models which employ named nodes and edges to capture differentiation among entities and relationships in richly diverse data collections such as in the biomedical domain. The flexibility of knowledge graphs allows for heterogeneous collections to be linked and integrated in precise ways. However, resulting data models often have irregular structures which are not easy to manage using platforms for structured, schema-first data models like the relational model. To facilitate exchange, inter-operability and reuse of data, standards such as Resource Description Framework (RDF) have been increasingly adopted for representation. Domains such as the biomedical now have large collections of publicly available RDF graphs as well as benchmark workloads. To achieve scalability in data processing, some efforts are being made to build on distributed processing platforms such as Hadoop and Spark. However, while some distributed graph platforms have emerged for certain classes of mining workloads for non-semantic graphs (without typed edges and nodes), knowledge graph processing, which often involves ontological inferencing, continues to be plagued by scalability and efficiency challenges. In this paper, we present the design of a Hadoop-based storage architecture for knowledge graphs that overcomes some of the challenges of big RDF data processing. The rationale of the design strategy is to go beyond the traditional approach of exploiting structural properties of graphs while storing to include exploitation of semantic properties of knowledge graphs. Our system SemStorm is a Hadoop-based indexed, polymorphic, signatured file organization that supports efficient storage of data collections with significant data heterogeneity. Naive storage models for such data place more demands for meta-data management than traditional systems can support. The polymorphic file organization is further coupled with a nested, column-oriented file format to enable discriminatory data access based on queries. A major hallmark of SemStorm is the enabling of semantic-awareness in storage framework. The idea is to exploit the knowledge represented in ontologies that accompany data for optimizing data storage models such as identifying and managing data (sometimes implicit) redundancies. Another major advantage of SemStorm is that it derives optimized storage models for data autonomically, i.e., without user input. Extensive experiments conducted on real-world and synthetic benchmark datasets show that SemStorm is up to 10X faster than existing approaches.
- Research Article
156
- 10.1007/s11704-016-5228-9
- Sep 26, 2016
- Frontiers of Computer Science
Information on the Internet is fragmented and presented in different data sources, which makes automatic knowledge harvesting and understanding formidable for machines, and even for humans. Knowledge graphs have become prevalent in both of industry and academic circles these years, to be one of the most efficient and effective knowledge integration approaches. Techniques for knowledge graph construction can mine information from either structured, semi-structured, or even unstructured data sources, and finally integrate the information into knowledge, represented in a graph. Furthermore, knowledge graph is able to organize information in an easy-to-maintain, easy-to-understand and easy-to-use manner. In this paper, we give a summarization of techniques for constructing knowledge graphs. We review the existing knowledge graph systems developed by both academia and industry. We discuss in detail about the process of building knowledge graphs, and survey state-of-the-art techniques for automatic knowledge graph checking and expansion via logical inferring and reasoning. We also review the issues of graph data management by introducing the knowledge data models and graph databases, especially from a NoSQL point of view. Finally, we overview current knowledge graph systems and discuss the future research directions.
- Research Article
10
- 10.1007/s11390-006-0430-0
- May 1, 2006
- Journal of Computer Science and Technology
In this paper, a Graph-based semantic Data Model (GDM) is proposed with the primary objective of bridging the gap between the human perception of an enterprise and the needs of computing infrastructure to organize information in some particular manner for efficient storage and retrieval. The Graph Data Model (GDM) has been proposed as an alternative data model to combine the advantages of the relational model with the positive features of semantic data models. The proposed GDM offers a structural representation for interacting to the designer, making it always easy to comprehend the complex relations amongst basic data items. GDM allows an entire database to be viewed as a Graph (V, E) in a layered organization. Here, a graph is created in a bottom up fashion where V represents the basic instances of data or a functionally abstracted module, called primary semantic group (PSG) and secondary semantic group (SSG). An edge in the model implies the relationship among the secondary semantic groups. The contents of the lowest layer are the semantically grouped data values in the form of primary semantic groups. The SSGs are nothing but the higher-level abstraction and are created by the method of encapsulation of various PSGs, SSGs and basic data elements. This encapsulation methodology to provide a higher-level abstraction continues generating various secondary semantic groups until the designer thinks that it is sufficient to declare the actual problem domain. GDM, thus, uses standard abstractions available in a semantic data model with a structural representation in terms of a graph. The operations on the data model are formalized in the proposed graph algebra. A Graph Query Language (GQL) is also developed, maintaining similarity with the widely accepted user-friendly SQL. Finally, the paper also presents the methodology to make this GDM compatible with the distributed environment, and a corresponding query processing technique for distributed environment is also suggested for the sake of completeness.
- Research Article
3
- 10.1080/17538947.2025.2512060
- Jun 3, 2025
- International Journal of Digital Earth
The urban physical examination is pivotal in diagnosing and resolving ‘urban diseases’. However, it encounters challenges including the intricate interconnections among heterogeneous data, the spatiotemporal differences in urban pathologies, and the multi-dimensional scenario-oriented representation. A knowledge graph is a potent technical instrument capable of depicting data replete with relational details and capturing the dependency ties among entities. In view of the above, this paper proposes a knowledge graph-based spatiotemporal data model for the urban physical examination. The model focuses on application scenarios, and builds a conceptual model with a three-layer architecture of ‘semantics-data-scenario’ and an ontology logic model structured as a hypergraph. This is intended to facilitate the management of spatiotemporal data throughout the process, while also enabling scenario-based spatiotemporal expression. Furthermore, this paper presents an in-depth analysis of the urban greenway construction case. The results show that the model organizes and expresses urban data and knowledge, covering multiple levels (such as themes, connotations, knowledge points, and indicators) and multiple granularities (macro-city, meso-region, micro-street), helping to understand urban physical examination from multiple dimensions. This not only provides a scientific basis for urban planners to make decisions but also sets an example for the practice of urban physical examination knowledge service.
- Research Article
75
- 10.3390/fi14050129
- Apr 24, 2022
- Future Internet
Knowledge graphs have, for the past decade, been a hot topic both in public and private domains, typically used for large-scale integration and analysis of data using graph-based data models. One of the central concepts in this area is the Semantic Web, with the vision of providing a well-defined meaning to information and services on the Web through a set of standards. Particularly, linked data and ontologies have been quite essential for data sharing, discovery, integration, and reuse. In this paper, we provide a systematic literature review on knowledge graph creation from structured and semi-structured data sources using Semantic Web technologies. The review takes into account four prominent publication venues, namely, Extended Semantic Web Conference, International Semantic Web Conference, Journal of Web Semantics, and Semantic Web Journal. The review highlights the tools, methods, types of data sources, ontologies, and publication methods, together with the challenges, limitations, and lessons learned in the knowledge graph creation processes.
- Conference Article
6
- 10.1145/319950.320059
- Nov 1, 1999
The management of multimedia information poses special requirements for multimedia information systems. Both representation and retrieval of the complex and multifaceted multimedia data are not easily handled with the flat relational model and require new data models. In the last several years, object-oriented and graph-based data models are actively pursued approaches for handling the multimedia information. In this paper the characteristics of the novel graph-based object-oriented data model are presented. This model represents the structural and behavioral aspects of data that form multimedia information systems. It also provides for handling the continuously changing user requirements and the complexity of the schema and data representation in multimedia information systems using the schema versioning approach and perspective version abstraction.
- Research Article
63
- 10.1007/s10115-023-01860-3
- Apr 29, 2023
- Knowledge and Information Systems
Cybersecurity knowledge graphs, which represent cyber-knowledge with a graph-based data model, provide holistic approaches for processing massive volumes of complex cybersecurity data derived from diverse sources. They can assist security analysts to obtain cyberthreat intelligence, achieve a high level of cyber-situational awareness, discover new cyber-knowledge, visualize networks, data flow, and attack paths, and understand data correlations by aggregating and fusing data. This paper reviews the most prominent graph-based data models used in this domain, along with knowledge organization systems that define concepts and properties utilized in formal cyber-knowledge representation for both background knowledge and specific expert knowledge about an actual system or attack. It is also discussed how cybersecurity knowledge graphs enable machine learning and facilitate automated reasoning over cyber-knowledge.
- Research Article
21
- 10.1145/3615952.3615956
- Aug 10, 2023
- ACM SIGMOD Record
Knowledge graphs (KGs) such as DBpedia, Freebase, YAGO, Wikidata, and NELL were constructed to store large-scale, real-world facts as (subject, predicate, object) triples - that can also be modeled as a graph, where a node (a subject or an object) represents an entity with attributes, and a directed edge (a predicate) is a relationship between two entities. Querying KGs is critical in web search, question answering (QA), semantic search, personal assistants, fact checking, and recommendation. While significant progress has been made on KG construction and curation, thanks to deep learning recently we have seen a surge of research on KG querying and QA. The objectives of our survey are two-fold. First, research on KG querying has been conducted by several communities, such as databases, data mining, semantic web, machine learning, information retrieval, and natural language processing (NLP), with different focus and terminologies; and also in diverse topics ranging from graph databases, query languages, join algorithms, graph patterns matching, to more sophisticated KG embedding and natural language questions (NLQs). We aim at uniting different interdisciplinary topics and concepts that have been developed for KG querying. Second, many recent advances on KG and query embedding, multimodal KG, and KG-QA come from deep learning, IR, NLP, and computer vision domains. We identify important challenges of KG querying that received less attention by graph databases, and by the DB community in general, e.g., incomplete KG, semantic matching, multimodal data, and NLQs. We conclude by discussing interesting opportunities for the data management community, for instance, KG as a unified data model and vector-based query processing.
- Research Article
57
- 10.1109/69.469818
- Jan 1, 1995
- IEEE Transactions on Knowledge and Data Engineering
Currently, database researchers are investigating new data models in order to remedy the deficiencies of the flat relational model when applied to nonbusiness applications. Herein we concentrate on a recent graph based data model called the hypernode model. The single underlying data structure of this model is the hypernode which is a digraph with a unique defining label. We present in detail the three components of the model, namely its data structure, the hypernode, its query and update language, called HNQL, and its provision for enforcing integrity constraints. We first demonstrate that the said data model is a natural candidate for formalising hypertext. We then compare it with other graph based data models and with set based data models. We also investigate the expressive power of HNQL. Finally, using the hypernode model as a paradigm for graph based data modelling, we show how to bridge the gap between graph based and set based data models, and at what computational cost this can be done.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">></ETX>
- Research Article
38
- 10.3390/jsan11040078
- Nov 22, 2022
- Journal of Sensor and Actuator Networks
In recent years, with the rapid development of Internet technology and applications, the scale of Internet data has exploded, which contains a significant amount of valuable knowledge. The best methods for the organization, expression, calculation, and deep analysis of this knowledge have attracted a great deal of attention. The knowledge graph has emerged as a rich and intuitive way to express knowledge. Knowledge reasoning based on knowledge graphs is one of the current research hot spots in knowledge graphs and has played an important role in wireless communication networks, intelligent question answering, and other applications. Knowledge graph-oriented knowledge reasoning aims to deduce new knowledge or identify wrong knowledge from existing knowledge. Different from traditional knowledge reasoning, knowledge reasoning methods oriented to knowledge graphs are more diversified due to the concise, intuitive, flexible, and rich knowledge expression forms in knowledge graphs. Based on the basic concepts of knowledge graphs and knowledge graph reasoning, this paper introduces the latest research progress in knowledge graph-oriented knowledge reasoning methods in recent years. Specifically, according to different reasoning methods, knowledge graph reasoning includes rule-based reasoning, distributed representation-based reasoning, neural network-based reasoning, and mixed reasoning. These methods are summarized in detail, and the future research directions and prospects of knowledge reasoning based on knowledge graphs are discussed and prospected.
- Research Article
110
- 10.1093/bib/bbac404
- Sep 23, 2022
- Briefings in Bioinformatics
Drug discovery and development is a complex and costly process. Machine learning approaches are being investigated to help improve the effectiveness and speed of multiple stages of the drug discovery pipeline. Of these, those that use Knowledge Graphs (KG) have promise in many tasks, including drug repurposing, drug toxicity prediction and target gene-disease prioritization. In a drug discovery KG, crucial elements including genes, diseases and drugs are represented as entities, while relationships between them indicate an interaction. However, to construct high-quality KGs, suitable data are required. In this review, we detail publicly available sources suitable for use in constructing drug discovery focused KGs. We aim to help guide machine learning and KG practitioners who are interested in applying new techniques to the drug discovery field, but who may be unfamiliar with the relevant data sources. The datasets are selected via strict criteria, categorized according to the primary type of information contained within and are considered based upon what information could be extracted to build a KG. We then present a comparative analysis of existing public drug discovery KGs and an evaluation of selected motivating case studies from the literature. Additionally, we raise numerous and unique challenges and issues associated with the domain and its datasets, while also highlighting key future research directions. We hope this review will motivate KGs use in solving key and emerging questions in the drug discovery domain.
- Research Article
7
- 10.1097/sla.0000000000005365
- Dec 30, 2021
- Annals of Surgery
The Weight of Surgical Knowledge: Navigating Information Overload.
- Research Article
22
- 10.1145/3409481.3409485
- Jul 27, 2020
- ACM SIGWEB Newsletter
Knowledge graphs (KGs) represent facts in the form of subject-predicate-object triples and are widely used to represent and share knowledge on the Web. Their ability to represent data in complex domains augmented with semantic annotations has attracted the attention of both research and industry. Yet, their widespread adoption in various domains and their generation processes have made the contents of these resources complicated. We speak of knowledge graph exploration as of the gradual discovery and understanding of the contents of a large and unfamiliar KG. In this paper, we present an overview of the state-of-the-art approaches for KG exploration. We divide them into three areas: profiling, search, and analysis and we argue that, while KG profiling and KG exploratory search received considerable attention, exploratory KG analytics is still in its infancy. We conclude with an overview of promising future research directions towards the design of more advanced KG exploration techniques.
- Book Chapter
6
- 10.1007/3-540-57530-8_8
- Jan 1, 1993
We present a wide-spectrum algebra and refinement calculus designed to allow one to reason about query optimization in graph-based data models.