Abstract

Probabilistic language models, e.g. those based on recurrent neural networks such as long short-term memory models (LSTMs), often face the problem of finding a high probability prediction from a sequence of random variables over a set of tokens. This is commonly addressed using a form of greedy decoding such as beam search, where a limited number of highest-likelihood paths (the beam width) of the decoder are kept, and at the end the maximum-likelihood path is chosen. In this work, we construct a quantum algorithm to find the globally optimal parse (i.e. for infinite beam width) with high constant success probability. When the input to the decoder follows a power law with exponent k > 0, our algorithm has runtime Rnf(R, k), where R is the alphabet size, n the input length; here f < 1/2, and frightarrow 0 exponentially fast with increasing k, hence making our algorithm always more than quadratically faster than its classical counterpart. We further modify our procedure to recover a finite beam width variant, which enables an even stronger empirical speedup while still retaining higher accuracy than possible classically. Finally, we apply this quantum beam search decoder to Mozilla’s implementation of Baidu’s DeepSpeech neural net, which we show to exhibit such a power law word rank frequency.

Highlights

  • A recurring task in the context of parsing and neural sequence to sequence models—such as machine translation (Ilya et al 2011; Sutskever et al 2014), natural language processing (Schmidhuber 2014) and generative models (Graves 2013)— is to find an optimal path of tokens from a sequential list of probability distributions

  • Our novel algorithmic contribution is to analyse a recently developed quantum maximum finding algorithm (Apeldoorn et al 2017) and its expected runtime when provided with a biased quantum sampler that we developed for formal grammars, under the premise that at each step the input tokens follow a power-law distribution; for a probabilistic sequence obtained from Mozilla’s DeepSpeech, the quantum search decoder is a power of ≈ 4–5 faster than possible classically (Fig. 2)

  • We analyse the runtime of Algorithm 2 for various choices of beam width numerically, and analyse its performance on a concrete example— Mozilla’s DeepSpeech implementation, a speech-to-text long short-term memory models (LSTMs) which we show to follow a power-law token distribution at each output frame

Read more

Summary

Introduction

A recurring task in the context of parsing and neural sequence to sequence models—such as machine translation (Ilya et al 2011; Sutskever et al 2014), natural language processing (Schmidhuber 2014) and generative models (Graves 2013)— is to find an optimal path of tokens (e.g. words or letters) from a sequential list of probability distributions Such a distribution can for instance be produced at the output layer of a recurrent neural network, e.g. a long short-term. A related task is found in transition based parsing of formal languages, such as context-free grammars (Hopcroft et al 2001; Zhang and Clark 2008; Zhang and features 2011; Zhu et al 2015; Dyer et al 2015) In this model, an input string is processed token by token, and a heuristic prediction

16 Page 2 of 24
Main results
16 Page 4 of 24
Quantum search decoding
Biased quantum sampling from a regular or context-free grammar
16 Page 6 of 24
The quantum search decoder
Power law decoder input
MOST LIKELY PARSE: query bound
HIGHEST SCORE PARSE: simple query bound
16 Page 8 of 24
MOST LIKELY PARSE: full query bound
Quantum beam search decoding
Analysis of the output rank frequency
Runtime bounds for quantum beam search decoding
Summary and conclusions
16 Page 10 of 24
Regular languages and finite state automata
16 Page 12 of 24
Context-free grammars and pushdown automata
16 Page 16 of 24
16 Page 18 of 24
Constant post-amplification
Non-constant post-amplification
16 Page 24 of 24
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call