Abstract

Exploring potentially useful information from huge amount of textual data produced by microblogging services has attracted much attention in recent years. An important preprocessing step of microblog text mining is to convert natural language texts into proper numerical representations. Due to the short-length characteristics of microblog texts, using term frequency vectors to represent microblog texts will cause “sparse data” problem. Finding proper representations of microblog texts is a challenging issue. In this paper, we apply deep networks to map the high-dimensional representations of microblog texts to low-dimensional representations. To improve the result of dimensionality reduction, we take advantage of the semantic similarity derived from two types of microblogspecific information, namely the retweet relationship and hashtags. Two types of approaches, including modifying training data and modifying the training objective of deep networks, are proposed to make use of microblog-specific information. Experiment results show that the deep models perform better than traditional dimensionality reduction methods such as latent semantic analysis and latent Dirichlet allocation topic model, and the use of microblog-specific information can help to learn better representations.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.