Word Alignment Research Articles

Purpose This paper aims to describe the structure of an aligned Serbian-German literary corpus (SrpNemKor) contained in a digital library Bibliša. The goal of the research was to create a benchmark Serbian-German annotated corpus searchable with various query expansions. Design/methodology/approach The presented research is particularly focused on the enhancement of bilingual search queries in a full-text search of aligned SrpNemKor collection. The enhancement is based on using existing lexical resources such as Serbian morphological electronic dictionaries and the bilingual lexical database Termi. Findings For the purpose of this research, the lexical database Termi is enriched with a bilingual list of German-Serbian translated pairs of lexical units. The list of correct translation pairs was extracted from SrpNemKor, evaluated and integrated into Termi. Also, Serbian morphological e-dictionaries are updated with new entries extracted from the Serbian part of the corpus. Originality/value A bilingual search of SrpNemKor in Bibliša is available within the user-friendly platform. The enriched database Termi enables semantic enhancement and refinement of user’s search query based on synonyms both in Serbian and German at a very high level. Serbian morphological e-dictionaries facilitate the morphological expansion of search queries in Serbian, thereby enabling the analysis of concepts and concept structures by identifying terms assigned to the concept, and by establishing relations between terms in Serbian and German which makes Bibliša a valuable Web tool that can support research and analysis of SrpNemKor.

Read full abstract

With the increase of translation demand, the advancement of information technology, the development of linguistic theories and the progress of natural language understanding models in artificial intelligence research, machine translation has gradually gained worldwide attention. However, at present, machine translation research still has problems such as insufficient bilingual data and lack of effective feature representation, which affects the further improvement of key modules of machine translation such as word alignment, sequence adjustment and translation modelling. The effect of machine translation is still unsatisfactory. As a new machine learning method, deep neural network can automatically learn abstract feature representation and establish a complex mapping relationship between input and output signals, which provides a new idea for statistical machine translation research. Firstly, the multi-layer neural network and the undirected probability graph model are combined, and the similarity and context information of vocabulary are effectively utilized to model the word alignment more fully, and the word alignment model named NNWAM is constructed. Secondly, the low dimension will be used. The feature representation is combined with other features into a linearly ordered pre-ordering model to construct the pre-ordering model named NNPR. Finally, the word alignment model and the pre-ordering model are combined in the same deep neural network framework to form DNNAPM, a statistical machine translation model based on deep neural networks. The experimental results show that the statistical machine translation model based on deep neural network has better effect, faster convergence and better reliability than the comparison model algorithm.

Read full abstract

Word Alignment Research Articles

Articles published on Word Alignment

Two approaches to compilation of bilingual multi-word terminology lists from lexical resources

Words and deeds – Experimental evidence on leading-by-example

Towards Better Word Alignment in Transformer

Multi-Information Spatial–Temporal LSTM Fusion Continuous Sign Language Neural Machine Translation

Improving English-Arabic statistical machine translation with morpho-syntactic and semantic word class

Future-Aware Knowledge Distillation for Neural Machine Translation

A Deep Neural Network Framework for English Hindi Question Answering

Topic sensitive image descriptions

An Automatic and a Machine-assisted Method to Clean Bilingual Corpus

병렬코퍼스 활용과 근대 한일 어휘 대조연구의 확장

Issues In Word Alignment From Hindi-English Languages

Identifying word evolution by incorporating PoS and avoiding alignment of temporal words

Bilingual lexical extraction based on word alignment for improving corpus search

Annotated Guidelines and Building Reference Corpus for Myanmar-English Word Alignment

Research on statistical machine translation model based on deep neural network

Permainan Bahasa dalam Meda Sosial

GBI Treebanks as a Resource for New Applications

TopicEq: A Joint Topic and Mathematical Equation Model for Scientific Texts

Holistic word context does not influence holistic processing of artificial objects in an interleaved composite task.

FAST: Fast and Accurate Synoptic Texts

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Word Alignment Research Articles

Articles published on Word Alignment

Two approaches to compilation of bilingual multi-word terminology lists from lexical resources

Words and deeds – Experimental evidence on leading-by-example

Towards Better Word Alignment in Transformer

Multi-Information Spatial–Temporal LSTM Fusion Continuous Sign Language Neural Machine Translation

Improving English-Arabic statistical machine translation with morpho-syntactic and semantic word class

Future-Aware Knowledge Distillation for Neural Machine Translation

A Deep Neural Network Framework for English Hindi Question Answering

Topic sensitive image descriptions

An Automatic and a Machine-assisted Method to Clean Bilingual Corpus

병렬코퍼스 활용과 근대 한일 어휘 대조연구의 확장

Issues In Word Alignment From Hindi-English Languages

Identifying word evolution by incorporating PoS and avoiding alignment of temporal words

Bilingual lexical extraction based on word alignment for improving corpus search

Annotated Guidelines and Building Reference Corpus for Myanmar-English Word Alignment

Research on statistical machine translation model based on deep neural network

Permainan Bahasa dalam Meda Sosial

GBI Treebanks as a Resource for New Applications

TopicEq: A Joint Topic and Mathematical Equation Model for Scientific Texts

Holistic word context does not influence holistic processing of artificial objects in an interleaved composite task.

FAST: Fast and Accurate Synoptic Texts