Opinion subset selection via submodular maximization

Yang Zhao,Tommy W.S Chow

doi:10.1016/j.ins.2020.12.083

Abstract

Current research on subset selection for opinion analysis assumes that their methods can retrieve the opinions expressed in documents from general text features. However, such relaxed conditions can hardly maintain the performance of the analysis in opinion mining, especially when given strict limitations on the subset size. In this paper, we propose a framework for opinion subset selection. This framework can select a small set of instances from original data to convey a subjective representation for opinion classification and regression. Compared with our framework, the conventional submodular based subset selection approach cannot capture the fine-grained opinion features expressed in the corpus. Specifically, we propose a monotone non-decreasing score function and a framework based on topic modeling and submodular maximization for filtering irrelevant information and selecting the subsets. Our work further introduces an opinion-sensitive algorithm for optimizing the proposed function for opinion subset construction. We perform extensive experiments and comparative analysis of different subset selection methods in this work. The experimental result shows that the proposed opinion subset selection framework can compress the original text training set and preserve the test set’s classification and regression metric performance at the same time.

Full Text