Abstract

Explicit semantic analysis (ESA) utilizes an immense Wikipedia index matrix in its interpreter part. This part of the analysis multiplies a large matrix by a term vector to produce a high-dimensional concept vector. A similarity measurement between two texts is performed between two concept vectors with numerous dimensions. The cost is expensive in both interpretation and similarity measurement steps. This paper proposes an economic scheme of ESA, named econo-ESA. We investigate two aspects of this proposal: dimensional reduction and experiments with various data. We use eight recycling test collections in semantic text similarity. The experimental results show that both the dimensional reduction and test collection characteristics can influence the results. They also show that an appropriate concept reduction of econo-ESA can decrease the cost with minor differences in the results from the original ESA.Electronic supplementary materialThe online version of this article (doi:10.1186/2193-1801-3-149) contains supplementary material, which is available to authorized users.

Highlights

  • Explicit semantic analysis (ESA) (Gabrilovich and Markovitch 2007) is a unique approach in information retrieval studies and other related researches

  • We propose a dimensional reduction of the Wikipedia matrix

  • We determined that the 50% index matrix reduction is the best based on the analysis in Section ‘Safe dimensional reduction’, but we found that the results were not always the same as our expectation

Read more

Summary

Introduction

ESA (Gabrilovich and Markovitch 2007) is a unique approach in information retrieval studies and other related researches. This method measures the relatedness of two texts in a concept space, rather than a term space. The method uses a straight-forward scenario inside a vector space model. Due to its simple and straight-forward approach, ESA is easy to understand. ESA is a variant of a generalized vector space model (GVSM) that uses Wikipedia as its index corpus. Wikipedia serves as an additional advantage to ESA because it yields good results for word and text-fragment relatedness measurements (Gabrilovich and Markovitch 2007)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call