Cross-Domain Visual Exploration of Academic Corpora via the Latent Meaning of User-Authored Keywords

Alejandro Benito-Santos,Roberto Theron Sanchez

doi:10.1109/access.2019.2929754

Alejandro Benito-Santos, Roberto Theron Sanchez

Open Access

PDF Available

https://doi.org/10.1109/access.2019.2929754

Copy DOI

Export

Save

Cite

Journal: IEEE Access	Publication Date: Jan 1, 2019
Citations: 16	License type: CC BY 4.0

Affiliation: Universidad de Salamanca

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Nowadays, scholars dedicate a substantial amount of their work to the querying and browsing of increasingly large collections of research papers on the Internet. In parallel, the recent surge of novel interdisciplinary approaches in science requires scholars to acquire competencies in new fields for which they may lack the necessary vocabulary to formulate adequate queries. This problem, together with the issue of information overload, poses new challenges in the fields of natural language processing (NLP) and visualization design that call for a rapid response from the scientific community. In this respect, we report on a novel visualization scheme that enables the exploration of research paper collections via the analysis of semantic proximity relationships found in author-assigned keywords. Our proposal replaces traditional string queries with a bag-of-words (BoW) extracted from a user-generated auxiliary corpus that captures the intentionality of the research. Continuing along the lines established by other authors in the fields of literature-based discovery (LBD), NLP, and visual analytics (VA), we combine novel advances in the fields of NLP with visual network analysis techniques to offer scholars a perspective of the target corpus that better fits their research interests. To highlight the advantages of our proposal, we conduct two experiments employing a collection of visualization research papers and an auxiliary cross-domain BoW. Here, we showcase how our visualization can be used to maximize the effectiveness of a browsing session by enhancing the language acquisition task, which allows for effectively extracting knowledge that is in line with the users’ previous expectations.

Highlights

The main contributions of this paper are outlined hereafter: first, we propose a semantic analysis of author-assigned keywords found in the primary and auxiliary corpora to form a set of keyword vector representations from which we derive proximity data
In IV, we describe the transformations and algorithms that were applied to the data in order to obtain a joint visualization of the keywords and document spaces, which is exemplified in Section V with two use-cases in the context of the interdisciplinary field of visualization in the digital humanities (DH)
1) SINGULAR VALUE DECOMPOSITION To produce a semantic analysis of the words in a corpus, latent semantic analysis (LSA) makes use of a well-known linear algebra matrix decomposition method called singular value decomposition (SVD), which we briefly summarize for the reader hereafter: SVD is used to decompose a given matrix M into the product of three matrices U V T, where U and V are orthonormal (U T U = V T V = I ) and is a diagonal matrix of sorted singular values of the same rank r as the input matrix

Summary

Introduction

A. THE PROBLEM OF INFORMATION OVERLOAD Recently, the adequate planning and scoping of research efforts has become a key task in academia. THE PROBLEM OF INFORMATION OVERLOAD Recently, the adequate planning and scoping of research efforts has become a key task in academia For this reason, scholars from all disciplines are spending more time seeking an adequate strategic position within a research body that allows them to develop their work according to practical societal needs and expectations. Scholars from all disciplines are spending more time seeking an adequate strategic position within a research body that allows them to develop their work according to practical societal needs and expectations In this context, the use of electronic scientific databases has become a widespread practice among scholars worldwide. Efforts are currently being made within the scientific community to systematize and automate the

Objectives

Methods

Conclusion