A scalable approach for statistical learning in semantic graphs

Yi Huang,Maximilian Nickel,Volker Tresp,Hans-Peter Kriegel,Achim Rettinger

doi:10.3233/sw-130100

Abstract

Increasingly, data is published in the form of semantic graphs. The most notable example is the Linked Open Data LOD initiative where an increasing number of data sources are published in the Semantic Web's Resource Description Framework and where the various data sources are linked to reference one another. In this paper we apply machine learning to semantic graph data and argue that scalability and robustness can be achieved via an urn-based statistical sampling scheme. We apply the urn model to the SUNS framework which is based on multivariate prediction. We argue that multivariate prediction approaches are most suitable for dealing with the resulting high-dimensional sparse data matrix. Within the statistical framework, the approach scales up to large domains and is able to deal with highly sparse relationship data. We summarize experimental results using a friend-of-a-friend data set and a data set derived from DBpedia. In more detail, we describe novel experiments on disease gene prioritization using LOD data sources. The experiments confirm the ease-of-use, the scalability and the good performance of the approach.

Full Text