Author Reputation Measurement on Question and Answer Sites by the Classification of Author-Generated Content

Erhan Sezerer,Ali Acar,Samet Tenekeci,Bora Baloğlu,Selma Tekir

doi:10.1142/s0218194021500479

Abstract

In the field of software engineering, practitioners’ share in the constructed knowledge cannot be underestimated and is mostly in the form of grey literature (GL). GL is a valuable resource though it is subjective and lacks an objective quality assurance methodology. In this paper, a quality assessment scheme is proposed for question and answer (Q&A) sites. In particular, we target stack overflow (SO) and stack exchange (SE) sites. We model the problem of author reputation measurement as a classification task on the author-provided answers. The authors’ mean, median, and total answer scores are used as inputs for class labeling. State-of-the-art language models (BERT and DistilBERT) with a softmax layer on top are utilized as classifiers and compared to SVM and random baselines. Our best model achieves [Formula: see text] accuracy in binary classification in SO design patterns tag and [Formula: see text] accuracy in SE software engineering category. Superior performance in SE software engineering can be explained by its larger dataset size. In addition to quantitative evaluation, we provide qualitative evidence, which supports that the system’s predicted reputation labels match the quality of provided answers.

Full Text