Scalable Construction and Reasoning of Massive Knowledge Bases

Xiang Ren,William Yang Wang,Nanyun Peng

doi:10.18653/v1/n18-6003

Abstract

In today’s information-based society, there is abundant knowledge out there carried in the form of natural language texts (e.g., news articles, social media posts, scientific publications), which spans across various domains (e.g., corporate documents, advertisements, legal acts, medical reports), which grows at an astonishing rate. Yet this knowledge is mostly inaccessible to computers and overwhelming for human experts to absorb. How to turn such massive and unstructured text data into structured, actionable knowledge, and furthermore, how to teach machines learn to reason and complete the extracted knowledge is a grand challenge to the research community. Traditional IE systems assume abundant human annotations for training high quality machine learning models, which is impractical when trying to deploy IE systems to a broad range of domains, settings and languages. In the first part of the tutorial, we introduce how to extract structured facts (i.e., entities and their relations for types of interest) from text corpora to construct knowledge bases, with a focus on methods that are weakly-supervised and domain-independent for timely knowledge base construction across various application domains. In the second part, we introduce how to leverage other knowledge, such as the distributional statistics of characters and words, the annotations for other tasks and other domains, and the linguistics and problem structures, to combat the problem of inadequate supervision, and conduct low-resource information extraction. In the third part, we describe recent advances in knowledge base reasoning. We start with the gentle introduction to the literature, focusing on path-based and embedding based methods. We then describe DeepPath, a recent attempt of using deep reinforcement learning to combine the best of both worlds for knowledge base reasoning.

Highlights

Motivation.The success of data mining and artificial intelligence technology is largely attributed to the efficient and effective analysis of structured data
We describe DeepPath, a recent attempt of using deep reinforcement learning to combine the best of both worlds for knowledge base reasoning
The majority of existing data generated in our society is unstructured, big data leads to big opportunities to uncover structures of real-world entities, attributes, relations from massive text corpora

Summary

Introduction

We will discuss the following key issues: (1) data-driven approaches for mining quality phrases from massive, unstructured text corpora; (2) entity recognition and typing: preliminaries, challenges, and methodologies; and (3) relation extraction: previous efforts, limitations, recent progress, and a joint entity and relation extraction method using distant supervision; (4) multi-task and multi-domain learning for lowresource information extraction; (5) distill linguistic knowledge into neural models to help low-resource information extraction. Extracting structured information is key to understanding messy and scattered raw data, and effective reasoning tools are critical for the use of KBs in downstream tasks like QA. This tutorial will present an organized picture of recent research on knowledge base construction and reasoning.

Outline

Pattern-based bootstrapping methods

Pattern-based boostrapping methods

Knowledge Base Reasoning

Organizers

Conference tutorial