HASKER: An efficient algorithm for string kernels. Application to polarity classification in various languages

Marius Popescu,Cristian Grozea,Radu Tudor Ionescu

doi:10.1016/j.procs.2017.08.207

HASKER: An efficient algorithm for string kernels. Application to polarity classification in various languages

Marius Popescu, Cristian Grozea + Show 1 more

Open Access

https://doi.org/10.1016/j.procs.2017.08.207

Copy DOI

Journal: Procedia Computer Science	Publication Date: Jan 1, 2017
Citations: 4	License type: cc-by-nc-nd

Affiliation: University of Bucharest, Fraunhofer Institute for Open Communication Systems

#String Kernels #Classification In Languages + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

String kernels have successfully been used for various NLP tasks, ranging from text categorization by topic to native language identification. In this paper, we present a simple and efficient algorithm for computing various spectrum string kernels. When comparing two strings, we store the p-grams in the first string into a hash table, and then we apply a hash table lookup for the p-grams that occur in the second string. In terms of time, we show that our algorithm can outperform a state-of-the-art tool for computing string similarity. In terms of accuracy, we show that our approach can reach state-of-the-art performance for polarity classification in various languages. Our efficient implementation is provided online for free at http://string-kernels.herokuapp.com.

Full Text