Abstract

With the development of new media industry, comments based user interaction is now fairly routine in live broadcasting. User comments usually appear in the form of short text with freestyle and cyber new words. The general word segmentation methods could not adapt to Chinese short text in new media comments. This paper proposes a novel method of Chinese short text segmentation to solve the problem of word segmentation granularity self-adaption. A New Media Comment Short Text Dataset(NMCD) is built for our researches, a word vector text containing cyber new words and entity words as well. Our optimized bidirectional Long Short Term Memory(LSTM) model based on attention mechanism and transfer learning could make number and its unit together after the word segmentation. The experiment results show that the Fl-score is improved by 21.43%. The word segmentation method in this paper could be efficiently applied to the new media comments analysis system later.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call