Abstract

Multidimensional text data contains both structured attributes and unstructured text. Unlike the traditional numerical data, it is not straightforward to apply online analytical processing on multidimensional text data. Although some OLAP methods such as topic cube have been proposed in order to effectively utilize its structured information and valuable text data, these methods cant tell the relations of topicwords. Considering that topics usually consist of several subtopics and each subtopic usually contains some topic words, we here use a topic network manner, in which related topic words are connected, to express the complex relations of topics. This paper introduces a new concept of topic network cube to perform OLAP analysis on multidimensional text databases. Firstly, we propose a method called GL-LDA based on Gibbs sampling outputs of Labeled LDA to measure the relations between topic words. Secondly, we give a storagemodel of topic network cubewhich can efficiently generate topic network using GL-LDA. Thirdly, we show how to perform OLAP analysis on topic network cube. Experimental results show that we can analyze multidimensional text databases in different granularity easily and effectively using just a few simple SQL statements, and the output network provides rich and useful information of topics.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.