Community question answering (cQA) platforms, like Yahoo! Answers, provide standard search APIs to browse past questions and answers. Since, the largest cQA services maintain massive sets of resolved questions, the necessity for effective methods to revitalize the information contained in their archives is getting more and more important to serve the needs of their members as promptly and as reliably as possible.In this paper, we present a novel strategy for effectively browsing cQA archives. The core idea is to induce the semantic classes of question-like search queries (e.g., “rib pain after ovulation” and “iron oxide household”) by means of the contextual information set up or represented by inferred views of their respective search sessions, namely views modeling previous queries entered by the same user.When searching cQA archives, members do not associate semantic classes to their queries, so we are considering the cQA as a knowledge base that defines a taxonomy of semantic classes for their posted questions, which provides an explicit mapping between search queries and these questions. Following a supervised learning approach, we investigate and analyze the most salient features that are necessary to automatically exploit this relevant cQA mapping.We carried out a large number of experiments using a rich set of attributes extracted from an automatically acquired big dataset from Yahoo! Answers. Our results confirm what was often only intuitively assumed, namely, that larger contexts actually help to detect semantic relations implicitly expressed in sequences of queries submitted by users during their search sessions. In particular, we discover that Explicit Semantic Analysis is extremely helpful for inferring discriminative semantic cues that reduce, and thus determine, the semantic range of question-like search queries. Conversely, constructing traditional bag-of-words models on top of prior queries in the session was detrimental.
Read full abstract