Federated Learning of N-Gram Language Models

Mingqing Chen,Adeline Wong,Rajiv Mathews,Françoise Beaufays,Cyril Allauzen,Ananda Theertha Suresh,Michael Riley

doi:10.18653/v1/k19-1012

Abstract

We propose algorithms to train production-quality n-gram language models using federated learning. Federated learning is a distributed computation platform that can be used to train global models for portable devices such as smart phones. Federated learning is especially relevant for applications handling privacy-sensitive data, such as virtual keyboards, because training is performed without the users’ data ever leaving their devices. While the principles of federated learning are fairly generic, its methodology assumes that the underlying models are neural networks. However, virtual keyboards are typically powered by n-gram language models for latency reasons. We propose to train a recurrent neural network language model using the decentralized FederatedAveraging algorithm and to approximate this federated model server-side with an n-gram model that can be deployed to devices for fast inference. Our technical contributions include ways of handling large vocabularies, algorithms to correct capitalization errors in user data, and efficient finite state transducer algorithms to convert word language models to word-piece language models and vice versa. The n-gram language models trained with federated learning are compared to n-grams trained with traditional server-based algorithms using A/B tests on tens of millions of users of a virtual keyboard. Results are presented for two languages, American English and Brazilian Portuguese. This work demonstrates that high-quality n-gram language models can be trained directly on client mobile devices without sensitive training data ever leaving the devices.

Highlights

1.1 Virtual keyboard applicationsVirtual keyboards for mobile devices provide a host of functionalities from decoding noisy spatial signals from tap and glide typing inputs to providing auto-corrections, word completions, and nextword predictions
As shown in Suresh et al (2019a), an n-gram language model (LM) approximated from such an recurrent neural network (RNN) LM is of higher quality than an n-gram LM trained on user data directly
The paper is organized along the lines of challenges associated with converting RNN LMs to ngram LMs for virtual keyboards: the feasibility of training neural models with a large vocabulary, inconsistent capitalization in the training data, and data sparsity in morphologically rich languages

Summary

Virtual keyboard applications

Virtual keyboards for mobile devices provide a host of functionalities from decoding noisy spatial signals from tap and glide typing inputs to providing auto-corrections, word completions, and nextword predictions. These features must fit within tight RAM and CPU budgets, and operate under strict latency constraints. For computation and memory efficiency, keyboard LMs typically have higher-order n-grams over a subset of the vocabulary, e.g. the most frequent 64K words, and the rest of the vocabulary only has unigrams. As shown in Suresh et al (2019a), an n-gram LM approximated from such an RNN LM is of higher quality than an n-gram LM trained on user data directly. We have not proposed to learn n-gram models directly using FederatedAveraging of n-gram counts for all orders

Federated learning

Unigram distributions

Collection

Convergence

Experiments

Review of SampleApprox

Capitalization

Morphologically rich languages

Neural language model

Approximated n-gram model

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Federated Learning of N-Gram Language Models

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2019
Citations: 45	License type: cc-by

Similar Papers

Joint unsupervised adaptation of n-gram and RNN language models via LDA-based hybrid mixture modeling
Ryo Masumura ... Yushi Aono
-
Ryo Masumura, et. al.Ryo Masumura ... Yushi Aono
01 Dec 2017
01 Dec 2017

Improvements to N-gram Language Model Using Text Generated from Neural Language Model
Masayuki Suzuki ... Nobuyasu Itoh
-
Masayuki Suzuki, et. al.Masayuki Suzuki ... Nobuyasu Itoh
01 May 2019
01 May 2019

An empirical study of statistical language models: n-gram language models vs. neural network language models
Freha Mezzoudj ... Abdelkader Benyettou
International Journal of Innovative Computing and Applications | VOL. 9
Freha Mezzoudj, et. al.Freha Mezzoudj ... Abdelkader Benyettou
01 Jan 2018
International Journal of Innovative Computing and Applications | VOL. 9

Verifying the long-range dependency of RNN language models
Tzu-Hsuan Tseng ... Chia-Ping Chen
-
Tzu-Hsuan Tseng, et. al.Tzu-Hsuan Tseng ... Chia-Ping Chen
01 Nov 2016
01 Nov 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Federated Learning of N-Gram Language Models

Abstract

Highlights

Summary

Talk to us

Similar Papers