Composition of weighted finite transducers in MapReduce

Bilal Elghadyry,Sébastien Verel,Faissal Ouardi

doi:10.1186/s40537-020-00397-4

Bilal Elghadyry, Sébastien Verel + Show 1 more

Open Access

https://doi.org/10.1186/s40537-020-00397-4

Copy DOI

Abstract

Weighted finite-state transducers have been shown to be a general and efficient representation in many applications such as text and speech processing, computational biology, and machine learning. The composition of weighted finite-state transducers constitutes a fundamental and common operation between these applications. The NP-hardness of the composition computation problem presents a challenge that leads us to devise efficient algorithms on a large scale when considering more than two transducers. This paper describes a parallel computation of weighted finite transducers composition in MapReduce framework. To the best of our knowledge, this paper is the first to tackle this task using MapReduce methods. First, we analyze the communication cost of this problem using Afrati et al. model. Then, we propose three MapReduce methods based respectively on input alphabet mapping, state mapping, and hybrid mapping. Finally, intensive experiments on a wide range of weighted finite-state transducers are conducted to compare the proposed methods and show their efficiency for large-scale data.

Highlights

Weighted finite-state transducer (WFST) has been used in a wide range of applications such as digital image processing [1], speech recognition [2], large-scale statistical machine translation [3], cryptography [4], recently in computational biology [5] where pairwise rational kernels are computed for metabolic network prediction and many other applications [6,7,8]
Two or more WFSTs (Ti)1≤i≤n, and outputs the composed WFST T realizing the composition of all input WFSTs, such that the input alphabet of transitions of WFSTs (Ti)+1 coincides with the output alphabet of Ti
Experiments are conducted on a large variety of WFST data sets randomly generated using FAdo library [31] with various combinations of attributes including a number of states |Q|, input alphabet size |A|, and output alphabet size |B|

Summary

Introduction

Weighted finite-state transducer (WFST) has been used in a wide range of applications such as digital image processing [1], speech recognition [2], large-scale statistical machine translation [3], cryptography [4], recently in computational biology [5] where pairwise rational kernels are computed for metabolic network prediction and many other applications [6,7,8]. Computing the composition of WFSTs is basically based on the Elghadyry et al J Big Data (2021) 8:22 standard composition of unweighted finite-state transducers. It takes as input, two or more WFSTs (Ti)1≤i≤n , and outputs the composed WFST T realizing the composition of all input WFSTs, such that the input alphabet of Ti+1 coincides with the output alphabet of Ti. The time complexity for computing this operation is shown to be O n i=1

Methods

Results

Conclusion