LEARNING ON GRAPHS: ALGORITHMS FOR CLASSIFICATION AND SEQUENTIAL DECISIONS

Giovanni Zappella

doi:10.13130/zappella-giovanni_phd2014-03-28

Abstract

In recent years, networked data have become widespread due to the increasing importance of social networks and other web-related applications. This growing interest is driving researchers to design new algorithms for solving important problems that involve networked data. In this thesis we present a few practical yet principled algorithms for learning and sequential decision-making on graphs. Classification of networked data is an important problem that has recently received a great deal of attention from the machine learning community. This is due to its many important practical applications: computer vision, bioinformatics, spam detection and text categorization, just to cite a few of the more conspicuous examples. We focus our attention on the task called ``node classification'', often studied in the semi-supervised (transductive) setting. We present two algorithms, motivated by different theoretical frameworks. The first algorithm is studied in the well-known online adversarial setting, within which it enjoys an optimal mistake bound (up to logarithmic factors). The second algorithm is based on a game-theoretic approach, where each node of the network is maximizing its own payoff. The setting corresponds to a Graph Transduction Game in which the graph is a tree. For this special case, we show that the Nash Equilibrium of the game can be reached in linear time. We complement our theoretical findings with an extensive set of experiments using datasets from many different domains. In the second part of the thesis, we present a rapidly emerging theme in the analysis of networked data: signed networks, graphs whose edges carry a label encoding the positive or negative nature of the relationship between the connected nodes. For example, social networks and e-commerce offer several examples of signed relationships: Slashdot users can tag other users as friends or foes, Epinions users can rate each other positively or negatively, Ebay users develop trust and distrust towards sellers in the network. More generally, two individuals that are related because they rate similar products in a recommendation website may agree or disagree in their ratings. Many heuristics for link classification in social networks are based on a form of social balance summarized by the motto “the enemy of my enemy is my friend”. This is equivalent to saying that the signs on the edges of a social graph tend to be consistent with some two-clustering structure of the nodes, where edges connecting nodes from the same cluster are positive and edges connecting nodes from different clusters are negative. We present algorithms for the batch transductive active learning setting, where the topology of the graph is known in advance and our algorithms can ask for the label of some specific edges during the training phase (before starting with the predictions). These algorithms can achieve different tradeoffs between the number of mistakes during the test phase and the number of labels required during the training phase. We also presented an…

Full Text