Abstract

Redescription mining aims at finding subsets of instances that can be re-described, characterized in multiple ways, using one or more disjoint sets of attributes that describe some set of instances. Current redescription mining algorithms either work with tabular data or with relational data — where binary relations between objects are used which allow representing descriptions as graphs. In this work, we propose novel type of redescription mining methodology that allows using tabular data in combination with background network information, where nodes of a network are instances in the tabular data. Background information is used to locate subsets of instances with some desired network property whereas tabular data are used to re-describe such interesting subsets. Methodology can be classified as constraint-based redescription mining, where we allow for a large variety of complex network-based soft constraints. The proposed framework is extensible, thus any network-related measure can be used to localize subsets of instances of interest. In addition, different types of network such as undirected, directed graphs, graph sequences or multiplex can be used as a background information. We demonstrate the applicability of the proposed framework on three use-case datasets involving country trade networks, biological (gene spatial) networks and social networks. The experimental evaluation demonstrates that the proposed approach outperforms existing, general redescription mining approaches with respect to intensity of network properties of the re-described instances without loss of accuracy, mostly even improving redescription accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call