Abstract

Abstract Topic modelling is a popular unsupervised method for text processing that provides interpretable document representation. One of the most high-level approaches is additively regularized topic models (ARTM). This method features better quality than other methods due to its flexibility and advanced regularization abilities. However, it is challenging to find an optimal learning strategy to create high-quality topics because a user needs to select the regularizers with their values and determine the order of application. Moreover, it may require many real runs or model training which makes this task time consuming. At the current moment, there is a lack of research on parameter optimization for ARTM-based models. Our work proposes an approach that formalizes the learning strategy into a vector of parameters which can be solved with evolutionary approach. We also propose a surrogate-based modification which utilizes machine learning methods that makes the approach for parameters search time efficient. We investigate different optimization algorithms (evolutionary and Bayesian) and their modifications with surrogates in application to topic modelling optimization using the proposed learning strategy approach. An experimental study conducted on English and Russian datasets indicates that the proposed approaches are able to find high-quality parameter solutions for ARTM and substantially reduce the execution time of the search.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.