A Probabilistic Approach to String Transformation

Ziqi Wang,Hang Li,Gu Xu,Ming Zhang

doi:10.1109/tkde.2013.11

Abstract

Many problems in natural language processing, data mining, information retrieval, and bioinformatics can be formalized as string transformation, which is a task as follows. Given an input string, the system generates the $k$ most likely output strings corresponding to the input string. This paper proposes a novel and probabilistic approach to string transformation, which is both accurate and efficient. The approach includes the use of a log linear model, a method for training the model, and an algorithm for generating the top $k$ candidates, whether there is or is not a predefined dictionary. The log linear model is defined as a conditional probability distribution of an output string and a rule set for the transformation conditioned on an input string. The learning method employs maximum likelihood estimation for parameter estimation. The string generation algorithm based on pruning is guaranteed to generate the optimal top $k$ candidates. The proposed method is applied to correction of spelling errors in queries as well as reformulation of queries in web search. Experimental results on large scale data show that the proposed approach is very accurate and efficient improving upon existing methods in terms of accuracy and efficiency in different settings.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Probabilistic Approach to String Transformation

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Knowledge and Data Engineering

Lead the way for us

Journal: IEEE Transactions on Knowledge and Data Engineering	Publication Date: May 1, 2014
Citations: 19

Similar Papers

GRAPH-BASED METHODS FOR LANGUAGE PROCESSING AND INFORMATION RETRIEVAL
Dragomir Radev
-
Dragomir RadevDragomir Radev
01 Jan 2006
01 Jan 2006

AN EFFICIENT APPROACH TO QUERY REFORMULATION IN WEB SEARCH
M Kiran Kumar
International Journal of Research in Engineering and Technology | VOL. 04
M Kiran Kumar M Kiran Kumar
25 Jun 2015
International Journal of Research in Engineering and Technology | VOL. 04

Graph-Based Natural Language Processing and Information Retrieval Rada Mihalcea and Dragomir Radev (University of North Texas and University of Michigan) Cambridge, UK: Cambridge University Press, 2011, viii+192 pp; hardbound, ISBN 978-0-521-89613-9, $65.00
Chris Biemann
Computational Linguistics | VOL. 38
Chris BiemannChris Biemann
01 Mar 2012
Computational Linguistics | VOL. 38

TermExtract: Accuracy of Compound Noun Detection in Japanese
Motoki Miyashita ... Vitaly Klyuev
-
Motoki Miyashita, et. al.Motoki Miyashita ... Vitaly Klyuev
01 Jan 2014
01 Jan 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Probabilistic Approach to String Transformation

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Knowledge and Data Engineering