Abstract

Text message stream which is produced by Instant Messager and Internet Relay Chat poses interesting and challenging problems for information technologies. It is beneficial to extract the conversations in this kind of chatting message stream for information management and knowledge finding. However, the data in text message stream are usually very short and incomplete, and it requires efficiency to monitor thousands of continuous chat sessions. Many existing text mining methods encounter challenges. This paper focuses on the conversation extraction in dynamic text message stream. We design the dynamic representation for messages to combine the text content information and linguistic feature in message stream. A memory structure of reversed maximal similar relationship is developed for renewable assignments when grouping messages into conversations. We finally propose a double time window algorithm based on above methods to extract conversations in dynamic text message stream. Experiments on a real dataset shows that our method outperforms two baseline methods introduced in a recent related paper about 47% and 15% in terms of F measure respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.