Abstract

Thanks to the strength of crowdsourcing, there is a lot of useful information on StackOverflow, the most popular Question and Answer (Qaamp;A) platform in software engineering area. This information can be treated as numerous URLs (Uniform Resource Locators), which can be categorized into URLs of Qaamp;As and URLs in Qaamp;As. The domain of former ones is Stack-Overflow itself, while domains of latter ones are miscellaneous, such as some personal blogs and so on. Although each Qaamp;A has been manually assigned tags, relations between URLs and tags are not clear enough. In this paper, we propose SOLinker, a method to build semantic links between various URLs and tags. Firstly, SOLinker identifies proper relations from a predefined relation set between tags and URLs, which is modeled as a text classification problem. Features are extracted from content of Qaamp;A, the URL and the tag list, and classification algorithms are Logistic Regression and Gradient Boosting Decision Tree, depending on the category of URLs. Secondly, there exists a partial tagging problem, which means for a URL in a Qaamp;A, there are only a part of tags of the Qaamp;A relating to the URL. To address this problem, we propose a semantic analysis method to analyze context of this URL and the URL itself from both implicit and explicit aspects. Then SOLinker will infer proper tags by the label propagation technique. Results show that our method is feasible and practical in constructing semantic links between tags and URLs of/in Qaamp;As. In particular, the F-Score of semantic relation identification is around 78%, 5% higher than the other existing method, and F-Score of partial tagging solving is around 88%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call