Rare Words Research Articles

The recognition and translation of organization names (ONs) is challenging due to the complex structures and high variability involved. ONs consist not only of common generic words but also names, rare words, abbreviations and business and industry jargon. ONs are a sub-class of named entity (NE) phrases, which convey key information in text. As such, the correct translation of ONs is critical for machine translation and cross-lingual information retrieval. The existing Chinese–Uyghur neural machine translation systems have performed poorly when applied to ON translation tasks. As there are no publicly available Chinese–Uyghur ON translation corpora, an ON translation corpus is developed here, which includes 191,641 ON translation pairs. A word segmentation approach involving characterization, tagged characterization, byte pair encoding (BPE) and syllabification is proposed here for ON translation tasks. A recurrent neural network (RNN) attention framework and transformer are adapted here for ON translation tasks with different sequence granularities. The experimental results indicate that the transformer model not only outperforms the RNN attention model but also benefits from the proposed word segmentation approach. In addition, a Chinese–Uyghur ON translation system is developed here to automatically generate new translation pairs. This work significantly improves Chinese–Uyghur ON translation and can be applied to improve Chinese–Uyghur machine translation and cross-lingual information retrieval. It can also easily be extended to other agglutinative languages.

Read full abstract

The article deals with the question of the lexicographic interpretation of the poetic language of V.A. Zhukovsky (1783—1852) in the form of the author’s dictionary, which became possible after the publication of the initial volumes of the “Complete Works and Letters” of the writer. Special attention is paid to poetic neologisms in the works of the first Russian romanticist. The question is raised about the typology of individual author’s lexemes. Occasionalisms properly, rare words and poetry, which became such as a result of the “replication” by many writers of successful and historically promising author’s neologisms of their predecessors are distinguished. The occasional units that are found in Zhukovsky’s translation of Homer’s poem “The Odyssey” and perform specific aesthetic functions are analyzed. The results of a comparative analysis of Zhukovsky’s poetic language with the lexicon of poets of the 18th — 20th centuries (N. M. Karamzin, I. I. Dmitriev, D. V. Davydov, K. N. Batyushkov, P. A. Vyazemsky, N. M. Yazykov , A. I. Polezhaev, D. V. Venevitinov, M. Yu. Lermontov and others) are presented in the article. The novelty of the research is seen in the fact that for the first time the neological layer of Zhukovsky’s poetic language is comprehended. The relevance of the study is due to the possibility of using the observations, facts and conclusions stated in it in the theory and practice of lexicography — when compiling explanatory, historical and author’s dictionaries.

Read full abstract

Rare Words Research Articles

Related Topics

Articles published on Rare Words

The Rare Word Issue in Natural Language Generation: A Character-Based Solution

From context-aware to knowledge-aware: Boosting OOV tokens recognition in slot tagging with background knowledge

Two halves of a meaningful text are statistically different

A hybrid naïve Bayes based on similarity measure to optimize the mixed-data classification

WITHDRAWN: Improving feature engineering by fine tuning the parameters of Skip gram model

The newspaper “Babochka” (1829–1832) as a linguistic source

Burstiness-Aware Web Search Analysis on Different Levels of Evidences

Re-Transformer: A Self-Attention Based Model for Machine Translation

On some sources of Sagang Sechen’s Teachings (1662)

Topic Modeling in Embedding Spaces

Analyzing Subword Techniques to Improve English to Sinhala Neural Machine Translation

Budowa i zastosowania korpusu monitorującego MoncoPL

How the Bible Is Written

A Neural-Network-Based Approach to Chinese–Uyghur Organization Name Translation

Authorship Attribution of Short Historical Arabic Texts using Stylometric Features and a KNN Classifier with Limited Training Data

Occasional and Rare Lexemes in the Poetic Language of V. A. Zhukovsky (Methodology of Dictionary Interpretation)

Rare Feature Selection in High Dimensions

Sacred, Profane, Troublesome, Adventurous

Jargon use in Public Understanding of Science papers over three decades.

Morphological Segmentation to Improve Crosslingual Word Embeddings for Low Resource Languages

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Rare Words Research Articles

Related Topics

Articles published on Rare Words

The Rare Word Issue in Natural Language Generation: A Character-Based Solution

From context-aware to knowledge-aware: Boosting OOV tokens recognition in slot tagging with background knowledge

Two halves of a meaningful text are statistically different

A hybrid naïve Bayes based on similarity measure to optimize the mixed-data classification

WITHDRAWN: Improving feature engineering by fine tuning the parameters of Skip gram model

The newspaper “Babochka” (1829–1832) as a linguistic source

Burstiness-Aware Web Search Analysis on Different Levels of Evidences

Re-Transformer: A Self-Attention Based Model for Machine Translation

On some sources of Sagang Sechen’s Teachings (1662)

Topic Modeling in Embedding Spaces

Analyzing Subword Techniques to Improve English to Sinhala Neural Machine Translation

Budowa i zastosowania korpusu monitorującego MoncoPL

How the Bible Is Written

A Neural-Network-Based Approach to Chinese–Uyghur Organization Name Translation

Authorship Attribution of Short Historical Arabic Texts using Stylometric Features and a KNN Classifier with Limited Training Data

Occasional and Rare Lexemes in the Poetic Language of V. A. Zhukovsky (Methodology of Dictionary Interpretation)

Rare Feature Selection in High Dimensions

Sacred, Profane, Troublesome, Adventurous

Jargon use in Public Understanding of Science papers over three decades.

Morphological Segmentation to Improve Crosslingual Word Embeddings for Low Resource Languages