Abstract

Stack Overflow is a question-answer community that provides rich information about computer programming and technology for software developers. Users can ask and answer questions on a wide range of programming topics as well search for problems that other users have faced and find solutions that other users have suggested. From a viewpoint of a technology product owner, Stack Overflow can report various issues that product users have, and this serves as valuable input to the product improvement process. This paper proposes an automated approach to classifying questions that are posted on Stack Overflow with regard to a certain kind of products, i.e. database products in particular. The categories of questions are defined at two levels, i.e problem and subproblem. The problem level includes development, installation, and performance tuning, while the subproblem level consists of design, limitation, and discussion. By cross-combining the two levels, questions can be classified into nine problem-subproblem classes. Natural language processing and text classification are used with several machine learning algorithms, i.e. Naïve Bayes, Decision Tree, Extra Trees, Random Forest, Logistic Regression, Stochastic Gradient Descent, Deep Learning Neural Network, and Convolutional Neural Network. The best classifiers for all classes are used further in a web-based tool that can classify each question by a problem-subproblem tag and also report the number of problems that users of a database product have posted. This information can benefit the owner of a database product in planning product maintenance and evolution.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.