Abstract

Expert finding in Community Question Answering (CQA) networks such as Stack Overflow is a practical issue facing a challenging problem called vocabulary gap. A widely used approach to overcome this problem is translation model. Different from prior works that only consider the relevancy of translations to a query, we intend to diversify query translations for better coverage of query topics. In this work, we have utilized the idea of clustering to group relevant translations to a given query into different clusters and then select representatives from each cluster as a set of diverse translations. We have proposed two new approaches to cluster translations. In the first one, the Mutual Information was primarily utilized as a similarity measure during clustering. In the second approach, the relevant translations are embedded in a topic space and then clustered in that space. After clustering, we propose two batch and sequential methods to select a diverse set of translations from the resultant clusters. The batch method selects the top most relevant translations from each cluster proportional to the relevancy of that cluster to the user query. The sequential one is an iterative method that looks for the most diverse set of translations considering the previously selected ones. Finally, to rank users, a regression model was utilized to learn how expert and non-expert users differ in using a set of diverse translations in their documents. Experiments on a large dataset generated from Stack Overflow demonstrate that the proposed methods improve the ranking performance over baselines in the expert finding.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.