Abstract

Scientific publications are an essential resource for detecting emerging trends and innovations in a very early stage, by far earlier than patents may allow. Thereby Visual Analytics systems enable a deep analysis by applying commonly unsupervised machine learning methods and investigating a mass amount of data. A main question from the Visual Analytics viewpoint in this context is, do abstracts of scientific publications provide a similar analysis capability compared to their corresponding full-texts? This would allow to extract a mass amount of text documents in a much faster manner. We compare in this paper the topic extraction methods LSI and LDA by using full text articles and their corresponding abstracts to obtain which method and which data are better suited for a Visual Analytics system for Technology and Corporate Foresight. Based on a easy replicable natural language processing approach, we further investigate the impact of lemmatization for LDA and LSI. The comparison will be performed qualitative and quantitative to gather both, the human perception in visual systems and coherence values. Based on an application scenario a visual trend analytics system illustrates the outcomes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call