Abstract

This paper introduces a new approach to the identification of topical structures from text data using independent component analysis (ICA). The approach resembles current probabilistic topic modelling approaches in some respects, however introduces an axiomatic definition of topic structures as independent feature functions over a given vocabulary. In addition, the identification of such structures from data is decoupled from estimates of topical compositions of the training documents. Considerations motivating these choices are discussed, and a proof-of-concept study on a small corpus is presented to demonstrate feasibility and interpretative features of the approach. As computational approach to ICA, a method based on distance covariance is used.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call