Building Language Models Research Articles

Purpose To examine the views of individuals with neurodegenerative diseases about ethical issues related to incorporating personalized language models into brain-computer interface (BCI) communication technologies. Methods Fifteen semi-structured interviews and 51 online free response surveys were completed with individuals diagnosed with neurodegenerative disease that could lead to loss of speech and motor skills. Each participant responded to questions after six hypothetical ethics vignettes were presented that address the possibility of building language models with personal words and phrases in BCI communication technologies. Data were analyzed with consensus coding, using modified grounded theory. Results Four themes were identified. (1) The experience of a neurodegenerative disease shapes preferences for personalized language models. (2) An individual’s identity will be affected by the ability to personalize the language model. (3) The motivation for personalization is tied to how relationships can be helped or harmed. (4) Privacy is important to people who may need BCI communication technologies. Responses suggest that the inclusion of personal lexica raises ethical issues. Stakeholders want their values to be considered during development of BCI communication technologies. Conclusions With the rapid development of BCI communication technologies, it is critical to incorporate feedback from individuals regarding their ethical concerns about the storage and use of personalized language models. Stakeholder values and preferences about disability, privacy, identity and relationships should drive design, innovation and implementation. IMPLICATIONS FOR REHABILITATION Individuals with neurodegenerative diseases are important stakeholders to consider in development of natural language processing within brain-computer interface (BCI) communication technologies. The incorporation of personalized language models raises issues related to disability, identity, relationships, and privacy. People who may one day rely on BCI communication technologies care not just about usability of communication technology but about technology that supports their values and priorities. Qualitative ethics-focused research is a valuable tool for exploring stakeholder perspectives on new capabilities of BCI communication technologies, such as the storage and use of personalized language models.

Read full abstract

We present a widely applicable methodology to bring machine translation (MT) to under-resourced languages in a cost-effective and rapid manner. Our proposal relies on web crawling to automatically acquire parallel data to train statistical MT systems if any such data can be found for the language pair and domain of interest. If that is not the case, we resort to (1) crowdsourcing to translate small amounts of text (hundreds of sentences), which are then used to tune statistical MT models, and (2) web crawling of vast amounts of monolingual data (millions of sentences), which are then used to build language models for MT. We apply these to two respective use-cases for Croatian, an under-resourced language that has gained relevance since it recently attained official status in the European Union. The first use-case regards tourism, given the importance of this sector to Croatia’s economy, while the second has to do with tweets, due to the growing importance of social media. For tourism, we crawl parallel data from 20 web domains using two state-of-the-art crawlers and explore how to combine the crawled data with bigger amounts of general-domain data. Our domain-adapted system is evaluated on a set of three additional tourism web domains and it outperforms the baseline in terms of automatic metrics and/or vocabulary coverage. In the social media use-case, we deal with tweets from the 2014 edition of the soccer World Cup. We build domain-adapted systems by (1) translating small amounts of tweets to be used for tuning by means of crowdsourcing and (2) crawling vast amounts of monolingual tweets. These systems outperform the baseline (Microsoft Bing) by 7.94 BLEU points (5.11 TER) for Croatian-to-English and by 2.17 points (1.94 TER) for English-to-Croatian on a test set translated by means of crowdsourcing. A complementary manual analysis sheds further light on these results.

Read full abstract

Building Language Models Research Articles

Related Topics

Articles published on Building Language Models

On Using Self-Report Studies to Analyze Language Models

Fast Exploring Literature by Language Machine Learning for Perovskite Solar Cell Materials Design

Ethical issues raised by incorporating personalized language models into brain-computer interface communication technologies: a qualitative study of individuals with neurological disease

Arabic Text Steganography Based on Deep Learning Methods

Automated Source Code Generation and Auto-Completion Using Deep Learning: Comparing and Discussing Current Language Model-Related Approaches

Методи та компоненти обробки природної мови

Crawl and crowd to bring machine translation to under-resourced languages

A study of n-gram and decision tree letter language modeling methods

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Building Language Models Research Articles

Related Topics

Articles published on Building Language Models

On Using Self-Report Studies to Analyze Language Models

Fast Exploring Literature by Language Machine Learning for Perovskite Solar Cell Materials Design

Ethical issues raised by incorporating personalized language models into brain-computer interface communication technologies: a qualitative study of individuals with neurological disease

Arabic Text Steganography Based on Deep Learning Methods

Automated Source Code Generation and Auto-Completion Using Deep Learning: Comparing and Discussing Current Language Model-Related Approaches

Методи та компоненти обробки природної мови

Crawl and crowd to bring machine translation to under-resourced languages

A study of n-gram and decision tree letter language modeling methods