Visualizing multiple word similarity measures

Brent Kievit-Kylar,Michael N Jones

doi:10.3758/s13428-012-0236-7

Abstract

Although many recent advances have taken place in corpus-based tools, the techniques used to guide exploration and evaluation of these systems have advanced little. Typically, the plausibility of a semantic space is explored by sampling the nearest neighbors to a target word and evaluating the neighborhood on the basis of the modeler's intuition. Tools for visualization of these large-scale similarity spaces are nearly nonexistent. We present a new open-source tool to plot and visualize semantic spaces, thereby allowing researchers to rapidly explore patterns in visual data that describe the statistical relations between words. Words are visualized as nodes, and word similarities are shown as directed edges of varying strengths. The "Word-2-Word" visualization environment allows for easy manipulation of graph data to test word similarity measures on their own or in comparisons between multiple similarity metrics. The system contains a large library of statistical relationship models, along with an interface to teach them from various language sources. The modularity of the visualization environment allows for quick insertion of new similarity measures so as to compare new corpus-based metrics against the current state of the art. The software is available at www.indiana.edu/~semantic/word2word/.

Full Text