Abstract

Community question answering (CQA) forums are Internet-based platforms where users ask questions about a topic and other expert users try to provide solutions. Many CQA forums such as Quora, Stackoverflow, Yahoo!Answer, StackExchange exist with a lot of user-generated data. These data are leveraged in automated CQA ranking systems where similar questions (and answers) are presented in response to the user's query. In this work, we empirically investigate a few aspects of this domain. First, in addition to traditional features like Term Frequency (TF), Inverse Document Frequency (IDF), Best Match 25 (BM25), etc., we introduce a Bidirectional Encoder Representations from Transformers (BERT)-based feature that captures the semantic similarity between the question and answer. Second, most of the existing research works have focus on features extracted only from the question part; features extracted from answers have not been explored extensively. We combine both types of features in a linear fashion. Third, using our proposed concepts, we conduct an empirical investigation with different rank-learning algorithms, some of which have not been used so far in CQA domain. On three standard CQA datasets, our proposed framework achieves state-of-the-art performance (0.56, 0.58 and 0.60 NDCG@10 values). We also analyze the importance of the features we use in our investigation. This work is expected to guide the practitioners to devise a better set of features for the CQA retrieval task.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.