Abstract

Recency ranking, in Community-based Question Answering (CQA), would refer to put recent answers in a list’s top positions. To be recent is not related to how new is the date of creation or editing of a given answer, but how current is the content of the answer. A good ranking should also consider the answers’ quality since a current but no quality answer may be useless. Similarly, a high-quality answer, presenting adequate text and references with obsolete information, may be valueless. Combining these two issues (recency and quality) is crucial as users usually hope for current solutions and need to have fast/easy access (top items in the ranking) to the best answers to solve their problems quickly. The CQAs usually provide voting mechanisms so that the users can indicate the best quality answers. However, this method is not concerned with the recency of the answers besides being a slow and subjective process, which does not keep up with new content’s dynamism. Therefore, we propose an automatic approach that, besides the quality, also considers the answer’s recency to generating the ranking. We have used textual and non-textual features that indicate the answers’ quality and recency, extracted from the users’ answers in the CQA environment as a whole. In our approach, quality is used to classify the answers between good and poor, using a threshold value, generating two sets of answers: high quality and low quality. Then, both sets are sorted into recency order. Finally, these sets are concatenated, giving rise to the final ranking, so that the best and most current answers are in the top positions. To verify our proposal’s effectiveness, we have performed a case study in Stack Overflow CQA with a set of experiments, using different combinations of characteristics and different learning to rank Stack Overflow. Then, our main contributions are: (1) an approach to ranking answers of a questions dataset on the recency and quality of an answer; (2) a thorough evaluation of 9 learning to rank algorithms, showing that Coordinate Ascent and LambdaMart have the best performance in this task; (3) a feature analysis, which has shown that features related to the age of the response contributed to improving the ranking performance taking recency and quality into account. Furthermore, as far as we know, it is the first work that considers the recency of an answer in this task.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call