Multilingual Text Research Articles

Due to the scarcity of available annotations in the biomedical domain, clinical natural language processing poses a substantial challenge, especially when applied to low-resource languages. This paper presents our contributions for the detection and normalization of clinical entities corresponding to symptoms, signs, and findings present in multilingual clinical texts. For this purpose, the three subtasks proposed in the SympTEMIST shared task of the Biocreative VIII conference have been addressed. For Subtask 1-named entity recognition in a Spanish corpus-an approach focused on BERT-based model assemblies pretrained on a proprietary oncology corpus was followed. Subtasks 2 and 3 of SympTEMIST address named entity linking (NEL) in Spanish and multilingual corpora, respectively. Our approach to these subtasks followed a classification strategy that starts from a bi-encoder trained by contrastive learning, for which several SapBERT-like models are explored. To apply this NEL approach to different languages, we have trained these models by leveraging the knowledge base of domain-specific medical concepts in Spanish supplied by the organizers, which we have translated into the other languages of interest by using machine translation tools. The results obtained in the three subtasks establish a new state of the art. Thus, for Subtask 1 we obtain precision results of 0.804, F1-score of 0.748, and recall of 0.699. For Subtask 2, we obtain performance gains of up to 5.5% in top-1 accuracy when the trained bi-encoder is followed by a WNT-softmax classification layer that is initialized with the mean of the embeddings of a subset of SNOMED-CT terms. For Subtask 3, the differences are even more pronounced, and our multilingual bi-encoder outperforms the other models analyzed in all languages except Swedish when combined with a WNT-softmax classification layer. Thus, the improvements in top-1 accuracy over the best bi-encoder model alone are 13% for Portuguese and 13.26% for Swedish. Database URL: https://doi.org/10.1093/database/baae087.

Read full abstract

The ideal multilingual health information website is relatable to all readers. Natives and immigrants should have a culturally adapted website in their language hosted by their place of residence that imparts the facts and incites a call to action to improve health, and ultimately reduces disease in the diverse community. The writer’s word choices and attitude as conveyed through the text influence the reader’s decision-making process. This paper will examine the differences between the non-translated and translated English versions of multilingual health information websites on HIV and TB diagnostic testing. These samples pertain to a large interdisciplinary study whose purpose is to determine whether the multilingual health communication websites are appropriately written regarding health literacy, and whether each cultural population, in terms of language adaptation, would receive the health information as intended. The study questions whether there exist differences between the translated and non-translated texts in English, Spanish and Catalan; this paper focuses on the two English sub-corpora. A comparable corpus of seventy-three multilingual health information websites underwent a quantitative and a qualitative analysis. The methodology is based on adaptations of Clerehan et al.’s (2005) Evaluative Linguistic Framework to assess the writer-reader relationship. The findings show statistically significant differences between the two English subcorpora as regards the writers’ and translators’ approach; the non-translated English sub-corpus contained more relational and engagement markers, whereas the translated English sub-corpus had more persuasion markers. These results should serve researchers and professionals in the translation and language sciences as well as the public health field for, respectively, future studies and techniques to improve the composition of multilingual health information texts in culturally diverse countries.

Read full abstract

Multilingual Text Research Articles

Related Topics

Articles published on Multilingual Text

Analyzing Multilingual French and Russian Text using NLTK, spaCy, and Stanza

At the Intersection of Translation Studies and Textual Scholarship

Käännös- ja tekstuaalitieteiden risteyksessä

Artificial Intelligence and Linguistic Landscape research

Translation as a restoration: Turkish translations of the Venice Charter

Learning-based short text compression using BERT models

Code-Switching in Translation: Linguistic Analysis of Multilingual Texts and their Translations

Deep Learning for Stylometry and Authorship Attribution: a Review of Literature

Rethinking Multilingual Scene Text Spotting: A Novel Benchmark and a Character-Level Feature Based Approach

Recognition and normalization of multilingual symptom entities using in-domain-adapted BERT models and classification layers.

GPT is an effective tool for multilingual psychological text analysis

A Hybrid Scene Text Script Identification Network for Regional Indian Languages

DEALING WITH LANGUAGE VARIATION IN A SOURCE TEXT: MALAY SKETCHES AND ITS MALAY TRANSLATION

Cross-cultural adaptation of the writer-reader relationship in non-translated and translated English health information websites on HIV and TB diagnostic testing

Особенности модификаций фразеологизмов в современных англо-, немецко-, франко- и испаноязычных версиях Библии: лингвокультурологический аспект

Male product endorsers and the embodiment of modern Thai masculinities in skincare advertisements

Deep Learning Techniques for Detecting and Segmenting Text in Natural Scene Images: Review

A multimodal contrastive analysis of regulations and instructions during the COVID-19 lockdown in the context of the Island of Madeira and the United Kingdom

Towards Media Monitoring: Detecting Known and Emerging Topics through Multilingual and Crosslingual Text Classification

Semantically Enriched Cross-Lingual Sentence Embeddings for Crisis-related Social Media Texts

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Multilingual Text Research Articles

Related Topics

Articles published on Multilingual Text

Analyzing Multilingual French and Russian Text using NLTK, spaCy, and Stanza

At the Intersection of Translation Studies and Textual Scholarship

Käännös- ja tekstuaalitieteiden risteyksessä

Artificial Intelligence and Linguistic Landscape research

Translation as a restoration: Turkish translations of the Venice Charter

Learning-based short text compression using BERT models

Code-Switching in Translation: Linguistic Analysis of Multilingual Texts and their Translations

Deep Learning for Stylometry and Authorship Attribution: a Review of Literature

Rethinking Multilingual Scene Text Spotting: A Novel Benchmark and a Character-Level Feature Based Approach

Recognition and normalization of multilingual symptom entities using in-domain-adapted BERT models and classification layers.

GPT is an effective tool for multilingual psychological text analysis

A Hybrid Scene Text Script Identification Network for Regional Indian Languages

DEALING WITH LANGUAGE VARIATION IN A SOURCE TEXT: MALAY SKETCHES AND ITS MALAY TRANSLATION

Cross-cultural adaptation of the writer-reader relationship in non-translated and translated English health information websites on HIV and TB diagnostic testing

Особенности модификаций фразеологизмов в современных англо-, немецко-, франко- и испаноязычных версиях Библии: лингвокультурологический аспект

Male product endorsers and the embodiment of modern Thai masculinities in skincare advertisements

Deep Learning Techniques for Detecting and Segmenting Text in Natural Scene Images: Review

A multimodal contrastive analysis of regulations and instructions during the COVID-19 lockdown in the context of the Island of Madeira and the United Kingdom

Towards Media Monitoring: Detecting Known and Emerging Topics through Multilingual and Crosslingual Text Classification

Semantically Enriched Cross-Lingual Sentence Embeddings for Crisis-related Social Media Texts