A quantum search decoder for natural language processing

Johannes Bausch,Stephen Piddock,Sathyawageeswar Subramanian

doi:10.1007/s42484-021-00041-1

Abstract

Probabilistic language models, e.g. those based on recurrent neural networks such as long short-term memory models (LSTMs), often face the problem of finding a high probability prediction from a sequence of random variables over a set of tokens. This is commonly addressed using a form of greedy decoding such as beam search, where a limited number of highest-likelihood paths (the beam width) of the decoder are kept, and at the end the maximum-likelihood path is chosen. In this work, we construct a quantum algorithm to find the globally optimal parse (i.e. for infinite beam width) with high constant success probability. When the input to the decoder follows a power law with exponent k > 0, our algorithm has runtime Rnf(R, k), where R is the alphabet size, n the input length; here f < 1/2, and frightarrow 0 exponentially fast with increasing k, hence making our algorithm always more than quadratically faster than its classical counterpart. We further modify our procedure to recover a finite beam width variant, which enables an even stronger empirical speedup while still retaining higher accuracy than possible classically. Finally, we apply this quantum beam search decoder to Mozilla’s implementation of Baidu’s DeepSpeech neural net, which we show to exhibit such a power law word rank frequency.

Highlights

A recurring task in the context of parsing and neural sequence to sequence models—such as machine translation (Ilya et al 2011; Sutskever et al 2014), natural language processing (Schmidhuber 2014) and generative models (Graves 2013)— is to find an optimal path of tokens from a sequential list of probability distributions
Our novel algorithmic contribution is to analyse a recently developed quantum maximum finding algorithm (Apeldoorn et al 2017) and its expected runtime when provided with a biased quantum sampler that we developed for formal grammars, under the premise that at each step the input tokens follow a power-law distribution; for a probabilistic sequence obtained from Mozilla’s DeepSpeech, the quantum search decoder is a power of ≈ 4–5 faster than possible classically (Fig. 2)
We analyse the runtime of Algorithm 2 for various choices of beam width numerically, and analyse its performance on a concrete example— Mozilla’s DeepSpeech implementation, a speech-to-text long short-term memory models (LSTMs) which we show to follow a power-law token distribution at each output frame

Summary

Introduction

A recurring task in the context of parsing and neural sequence to sequence models—such as machine translation (Ilya et al 2011; Sutskever et al 2014), natural language processing (Schmidhuber 2014) and generative models (Graves 2013)— is to find an optimal path of tokens (e.g. words or letters) from a sequential list of probability distributions Such a distribution can for instance be produced at the output layer of a recurrent neural network, e.g. a long short-term. A related task is found in transition based parsing of formal languages, such as context-free grammars (Hopcroft et al 2001; Zhang and Clark 2008; Zhang and features 2011; Zhu et al 2015; Dyer et al 2015) In this model, an input string is processed token by token, and a heuristic prediction

16 Page 2 of 24

Main results

16 Page 4 of 24

Quantum search decoding

Biased quantum sampling from a regular or context-free grammar

16 Page 6 of 24

The quantum search decoder

Power law decoder input

MOST LIKELY PARSE: query bound

HIGHEST SCORE PARSE: simple query bound

16 Page 8 of 24

MOST LIKELY PARSE: full query bound

Quantum beam search decoding

Analysis of the output rank frequency

Runtime bounds for quantum beam search decoding

Summary and conclusions

16 Page 10 of 24

Regular languages and finite state automata

16 Page 12 of 24

Context-free grammars and pushdown automata

16 Page 16 of 24

16 Page 18 of 24

Constant post-amplification

Non-constant post-amplification

16 Page 24 of 24

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Quantum machine intelligence	Publication Date: Apr 30, 2021
Citations: 4	License type: open-access

R Discovery Prime

R Discovery Prime

A quantum search decoder for natural language processing

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Quantum machine intelligence

Lead the way for us

Similar Papers

Mid-Term Load Forecasting by LSTM Model of Deep Learning with Hyper-Parameter Tuning
Ashish Prajesh ... Satish Sharma
-
Ashish Prajesh, et. al.Ashish Prajesh ... Satish Sharma
01 Jan 2023
01 Jan 2023

A data-driven strategy using long short term memory models and reinforcement learning to predict building electricity consumption
Xinlei Zhou ... Zhenjun Ma
Applied energy | VOL. 306
Xinlei Zhou, et. al.Xinlei Zhou ... Zhenjun Ma
02 Nov 2021
Applied energy | VOL. 306

Evaluation of data preprocessing and feature selection process for prediction of hourly PM10 concentration using long short-term memory models
İpek Aksangür ... Caner Erden
Environmental Pollution. Series A, Ecological and Biological | VOL. 311
İpek Aksangür, et. al.İpek Aksangür ... Caner Erden
17 Aug 2022
Environmental Pollution. Series A, Ecological and Biological | VOL. 311

Sea Surface Temperature and High Water Temperature Occurrence Prediction Using a Long Short-Term Memory Model
Minkyu Kim ... Hyun Yang
Remote sensing | VOL. 12
Minkyu Kim, et. al.Minkyu Kim ... Hyun Yang
07 Nov 2020
Remote sensing | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A quantum search decoder for natural language processing

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Quantum machine intelligence