Abstract

When modeling topics from chat messages of developer instant messaging communication, individual chat messages are short text documents. Our study aims at understanding how short text topic models perform with conversations from developer instant messaging. We applied four models to nine Gitter chat rooms (with sizes ranging from ≈100 to ≈160,000 messages). To assess the quality of topics and identify the best performing models, we compared topics based on four metrics for topic coherence. Furthermore, for a subset of Gitter chat rooms we used two human-based assessments: intrusion tasks with 18 experts analyzing 40 topics each, and topic naming (assigning a name to a topic that summarizes its main concept) with eight additional experts naming 60 topics each. Models performed differently in terms of coherence metrics and human assessment depending on the corpus (small, medium or large chat room). Our findings offer recommendations for the selection and use of short text topic models with developer chat messages based on characteristics of models and their performance with different sizes of corpora, and based on different strategies to assess topic quality.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.