Extension of the Rocchio Classification Method to Multi-modal Categorization of Documents in Social Media

Amin Mantrach,Jean-Michel Renders

doi:10.1007/978-3-642-33460-3_14

Abstract

Most of the approaches in multi-view categorization use early fusion, late fusion or co-training strategies. We propose here a novel classification method that is able to efficiently capture the interactions across the different modes. This method is a multi-modal extension of the Rocchio classification algorithm --- very popular in the Information Retrieval community. The extension consists of simultaneously maintaining different centroid representations for each class, in particular cross-media centroids that correspond to pairs of modes. To classify new data points, different scores are derived from similarity measures between the new data point and these different centroids; a global classification score is finally obtained by suitably aggregating the individual scores. This method outperforms the multi-view logistic regression approach (using either the early fusion or the late fusion strategies) on a social media corpus - namely the ENRON email collection - on two very different categorization tasks (folder classification and recipient prediction).

Full Text