Abstract
This paper presents a new model for word alignments between parallel sentences, which allows one ~o accurately estimate different parameters, in a computationally efficient way. An application of this model to bilingual terminology extraction, where terms are identified in one language and guessed, through the alignment process, in the other one, is also described. An experiment conducted on a small English-French parallel corpus gave results with high precision, demonstrating the validity of the model. 1 I n t r o d u c t i o n l?[arly works, (Gale and Church, 1993; Brown et el., 1993), and to a certain extent (Kay and RSscheisen, 1993), presented methods to ext,'act b!Ji!!gua! lexicons of words from a parallel corpus, relying on the dist:ribu;ioa of tl:e wo:ds in the set of parallel sentences (or other units). (Brown et el., 1993) then extended their method and established a sound probabilistic model series, relying on different parameters describing how words within parallel sentences are aligned to each other. On the other hand, (Dagan et el., 1993) proposed an algorithm, borrowed to the field of dynamic programming and based on the output of their previous work, to find the best alignment, subject to certain constraints, between words in parallel sentences. A similar algorithm was used by (Vogel et el., 1996). Investigating alignments at the sentence level allows to clean and to refine the leMcons otherwise extracted from a parallel corpus as a whole, pruning what (Melamed, 1996) calls indirect associations. Now, what differentiates the models and algorithms proposed are the sets of parameters and constraints they rely on, their ability to find an appropriate solution under the constraints defined and their ability to nicely integrate new parameters. We want to present here a model of the possible alignments in the form of flow networks. This representation allows to define different kinds of alignments and to find the most probable or an approximation of this most probable alignment, under certain constraints. Our procedure presents the advantage of an accurate modelling of the possible alignments, and can be used on small corpora. We will introduce this model in the next section. Section 3 describes a particular use of this model to find term translations, and presents the results we obtained for this task on a small corpus. Finally, the main features of our work and the research directions we envisage are summarized in the conlcusion. 2 A l i g n m e n t s a n d f low n e t w o r k s Let us ~irst consider the following a!igIled sentences, with the actual alignment beween words~:
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.