A Sampling-Based Framework for Transductive Classification in Information Networks

Brucce N Dos Santos,Ricardo M Marcacini,Solange O Rezende

doi:10.1109/bracis.2019.00120

Abstract

Knowledge extraction from large information networks has received increasing attention in recent years. Among existing methods for knowledge extraction, transductive classification is a well-known semi-supervised learning method, where both labeled and unlabeled vertices are used in the learning process. However, transductive classification tasks become impractical in large information networks and the use of network sampling techniques in the transductive classification setting is not a trivial task, since it is required that all the vertices of the original network be classified during the transductive learning – and not only the vertices of the sample. In this paper, we present a framework called TCSN (Transductive Classification for Sampled Networks). TCSN allows the use of various network sampling techniques, as well as enables the use of various methods of transductive classification for information networks. We present a variation of the Chernoff Bounds method to calculate the minimum size of a sampled network, thereby bounding sampling error within a pre-specified tolerance level. Moreover, TCSN extends the concept of evidence accumulation to combine the results of several rounds of transductive classification into a final classification. Experimental results from different information networks reveals that TCSN statistically outperformed the classification performance in the whole original network. These promising results show that the TCSN enables transductive classification in large information networks without loss of quality in the knowledge extraction process.

Full Text