Abstract

Weighted finite-state transducers have been shown to be a general and efficient representation in many applications such as text and speech processing, computational biology, and machine learning. The composition of weighted finite-state transducers constitutes a fundamental and common operation between these applications. The NP-hardness of the composition computation problem presents a challenge that leads us to devise efficient algorithms on a large scale when considering more than two transducers. This paper describes a parallel computation of weighted finite transducers composition in MapReduce framework. To the best of our knowledge, this paper is the first to tackle this task using MapReduce methods. First, we analyze the communication cost of this problem using Afrati et al. model. Then, we propose three MapReduce methods based respectively on input alphabet mapping, state mapping, and hybrid mapping. Finally, intensive experiments on a wide range of weighted finite-state transducers are conducted to compare the proposed methods and show their efficiency for large-scale data.

Highlights

  • Weighted finite-state transducer (WFST) has been used in a wide range of applications such as digital image processing [1], speech recognition [2], large-scale statistical machine translation [3], cryptography [4], recently in computational biology [5] where pairwise rational kernels are computed for metabolic network prediction and many other applications [6,7,8]

  • Two or more WFSTs (Ti)1≤i≤n, and outputs the composed WFST T realizing the composition of all input WFSTs, such that the input alphabet of transitions of WFSTs (Ti)+1 coincides with the output alphabet of Ti

  • Experiments are conducted on a large variety of WFST data sets randomly generated using FAdo library [31] with various combinations of attributes including a number of states |Q|, input alphabet size |A|, and output alphabet size |B|

Read more

Summary

Introduction

Weighted finite-state transducer (WFST) has been used in a wide range of applications such as digital image processing [1], speech recognition [2], large-scale statistical machine translation [3], cryptography [4], recently in computational biology [5] where pairwise rational kernels are computed for metabolic network prediction and many other applications [6,7,8]. Computing the composition of WFSTs is basically based on the Elghadyry et al J Big Data (2021) 8:22 standard composition of unweighted finite-state transducers. It takes as input, two or more WFSTs (Ti)1≤i≤n , and outputs the composed WFST T realizing the composition of all input WFSTs, such that the input alphabet of Ti+1 coincides with the output alphabet of Ti. The time complexity for computing this operation is shown to be O n i=1

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call