Abstract

The hypergraph-of-entity is a joint representation model for terms, entities and their relations, used as an indexing approach in entity-oriented search. In this work, we characterize the structure of the hypergraph, from a microscopic and macroscopic scale, as well as over time with an increasing number of documents. We use a random walk based approach to estimate shortest distances and node sampling to estimate clustering coefficients. We also propose the calculation of a general mixed hypergraph density measure based on the corresponding bipartite mixed graph. We analyze these statistics for the hypergraph-of-entity, finding that hyperedge-based node degrees are distributed as a power law, while node-based node degrees and hyperedge cardinalities are log-normally distributed. We also find that most statistics tend to converge after an initial period of accentuated growth in the number of documents. We then repeat the analysis over three extensions—materialized through synonym, context, and tf_bin hyperedges—in order to assess their structural impact in the hypergraph. Finally, we focus on the application-specific aspects of the hypergraph-of-entity, in the domain of information retrieval. We analyze the correlation between the retrieval effectiveness and the structural features of the representation model, proposing ranking and anomaly indicators, as useful guides for modifying or extending the hypergraph-of-entity.

Highlights

  • Complex networks have frequently been studied as graphs, but only recently has attention been given to the study of complex networks as hypergraphs (Estrada and Rodriguez-Velazquez 2005)

  • In “Analyzing the hypergraph-of-entity base model” section, we present the results of a characterization experiment of the hypergraph-of-entity for a subset of the INEX (INitiative for the Evaluation of XML Retrieval) 2009 Wikipedia collection and, in “Analyzing the structural impact of different index extensions” section, we explore the effect of including synonyms, contextual similarity, or Term frequency (TF)-bins in the structure of the hypergraph

  • We expanded on the characterization work by analyzing different model extensions based on synonymy, contextual similarity, and a new concept of TF-bins, and we measured the run time of several operations like indexing and the computation of properties

Read more

Summary

Introduction

Complex networks have frequently been studied as graphs, but only recently has attention been given to the study of complex networks as hypergraphs (Estrada and Rodriguez-Velazquez 2005). The hypergraph-of-entity (Devezas and Nunes 2019) is a hypergraph-based model used to represent combined data (Bast et al 2016, §2.1.3) That is, it is a joint representation of corpora and knowledge bases, integrating terms, entities and their relations. The hypergraph-of-entity, together with its random walk score (Devezas and Nunes 2019, §4.2.2), is an attempt to generalize several tasks of entity-oriented search. This includes ad hoc document retrieval and ad hoc entity retrieval, as well as the recommendation-alike tasks of related entity finding and entity list completion.

Objectives
Methods
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call