Abstract

AbstractRelational autocompletion is the problem of automatically filling out some missing values in multi-relational data. We tackle this problem within the probabilistic logic programming framework ofDistributional Clauses(DCs), which supports both discrete and continuous probability distributions. Within this framework, we introduceDiceML– an approach to learn both the structure and the parameters of DC programs from relational data (with possibly missing data). To realize this,DiceMLintegrates statistical modeling and DCs with rule learning. The distinguishing features ofDiceMLare that it (1) tackles autocompletion in relational data, (2) learns DCs extended with statistical models, (3) deals with both discrete and continuous distributions, (4) can exploit background knowledge, and (5) uses an expectation–maximization-based (EM) algorithm to cope with missing data. The empirical results show the promise of the approach, even when there is missing data.

Highlights

  • Spreadsheets are arguably the most accessible tool for data analysis and millions of users use them

  • We study the problem of relational autocompletion, where the goal is to automatically fill out the entries specified by users in multiple related tables

  • Afterwards, we show how to learn a set of distributional logic trees (DLTs), that is, a joint model programs (JMPs) in an iterative EM-like manner, which is useful to deal with missing values

Read more

Summary

Introduction

Spreadsheets are arguably the most accessible tool for data analysis and millions of users use them. By Statistical Relational AI (StarAI, Kersting et al 2011) to analyze such data To tackle this issue, we study the problem of relational autocompletion, where the goal is to automatically fill out the entries specified by users in multiple related tables. We study the problem of relational autocompletion, where the goal is to automatically fill out the entries specified by users in multiple related tables This problem setting is simple, yet challenging and is viewed as an essential component of an automatic data scientist (De Raedt et al 2018). We tackle this problem by learning a probabilistic logic program that defines the joint probability distribution over attributes of all instances in the multiple related tables. This program can be used to estimate the most likely values of the cells of interest

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call