When modeling topics from chat messages of developer instant messaging communication, individual chat messages are short text documents. Our study aims at understanding how short text topic models perform with conversations from developer instant messaging. We applied four models to nine Gitter chat rooms (with sizes ranging from ≈100 to ≈160,000 messages). To assess the quality of topics and identify the best performing models, we compared topics based on four metrics for topic coherence. Furthermore, for a subset of Gitter chat rooms we used two human-based assessments: intrusion tasks with 18 experts analyzing 40 topics each, and topic naming (assigning a name to a topic that summarizes its main concept) with eight additional experts naming 60 topics each. Models performed differently in terms of coherence metrics and human assessment depending on the corpus (small, medium or large chat room). Our findings offer recommendations for the selection and use of short text topic models with developer chat messages based on characteristics of models and their performance with different sizes of corpora, and based on different strategies to assess topic quality.