Abstract

Knowledge extraction from large information networks has received increasing attention in recent years. Among existing methods for knowledge extraction, transductive classification is a well-known semi-supervised learning method, where both labeled and unlabeled vertices are used in the learning process. However, transductive classification tasks become impractical in large information networks and the use of network sampling techniques in the transductive classification setting is not a trivial task, since it is required that all the vertices of the original network be classified during the transductive learning – and not only the vertices of the sample. In this paper, we present a framework called TCSN (Transductive Classification for Sampled Networks). TCSN allows the use of various network sampling techniques, as well as enables the use of various methods of transductive classification for information networks. We present a variation of the Chernoff Bounds method to calculate the minimum size of a sampled network, thereby bounding sampling error within a pre-specified tolerance level. Moreover, TCSN extends the concept of evidence accumulation to combine the results of several rounds of transductive classification into a final classification. Experimental results from different information networks reveals that TCSN statistically outperformed the classification performance in the whole original network. These promising results show that the TCSN enables transductive classification in large information networks without loss of quality in the knowledge extraction process.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.