Cite2vec: Citation-Driven Document Exploration via Word Embeddings.

Matthew Berger,Katherine Mcdonough,Lee M Seversky

doi:10.1109/tvcg.2016.2598667

Abstract

Effectively exploring and browsing document collections is a fundamental problem in visualization. Traditionally, document visualization is based on a data model that represents each document as the set of its comprised words, effectively characterizing what the document is. In this paper we take an alternative perspective: motivated by the manner in which users search documents in the research process, we aim to visualize documents via their usage, or how documents tend to be used. We present a new visualization scheme - cite2vec - that allows the user to dynamically explore and browse documents via how other documents use them, information that we capture through citation contexts in a document collection. Starting from a usage-oriented word-document 2D projection, the user can dynamically steer document projections by prescribing semantic concepts, both in the form of phrase/document compositions and document:phrase analogies, enabling the exploration and comparison of documents by their use. The user interactions are enabled by a joint representation of words and documents in a common high-dimensional embedding space where user-specified concepts correspond to linear operations of word and document vectors. Our case studies, centered around a large document corpus of computer vision research papers, highlight the potential for usage-based document visualization.

Full Text