Discovery Logo
Sign In
Search
Paper
Search Paper
R Discovery for Libraries Pricing Sign In
  • Home iconHome
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Literature Review iconLiterature Review NEW
  • Chat PDF iconChat PDF Star Left icon
  • Citation Generator iconCitation Generator
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link
  • Paperpal iconPaperpal
    External link
  • Mind the Graph iconMind the Graph
    External link
  • Journal Finder iconJournal Finder
    External link
Discovery Logo menuClose menu
  • Home iconHome
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Literature Review iconLiterature Review NEW
  • Chat PDF iconChat PDF Star Left icon
  • Citation Generator iconCitation Generator
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link
  • Paperpal iconPaperpal
    External link
  • Mind the Graph iconMind the Graph
    External link
  • Journal Finder iconJournal Finder
    External link
features
  • Audio Papers iconAudio Papers
  • Paper Translation iconPaper Translation
  • Chrome Extension iconChrome Extension
Content Type
  • Journal Articles iconJournal Articles
  • Conference Papers iconConference Papers
  • Preprints iconPreprints
  • Seminars by Cassyni iconSeminars by Cassyni
More
  • R Discovery for Libraries iconR Discovery for Libraries
  • Research Areas iconResearch Areas
  • Topics iconTopics
  • Resources iconResources

Related Topics

  • Speaking Style
  • Speaking Style

Articles published on Diverse Speakers

Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
65 Search results
Sort by
Recency
  • Research Article
  • Cite Count Icon 1
  • 10.1162/tacl.a.628
VoiceBench: Benchmarking LLM-Based Voice Assistants
  • Apr 1, 2026
  • Transactions of the Association for Computational Linguistics
  • Yiming Chen + 5 more

Abstract Recent advancements in large language models (LLMs) like GPT-4o have enabled real-time speech interactions through LLM-based voice assistants, offering an improved user experience over text-based interactions. However, a suitable benchmark to rigorously evaluate such speech interactions systems is currently lacking. To bridge this gap, we introduce VoiceBench, the first benchmark specifically designed to assess LLM-based voice assistants. VoiceBench comprises 6,783 synthetic and real spoken instructions recorded from diverse speakers across eight distinct tasks. These instructions are meticulously crafted to assess three crucial capability areas: general knowledge, instruction-following, and safety compliance. Furthermore, VoiceBench systematically incorporates realistic variations common in spoken interactions, including differences in speaker characteristics (e.g., accents), heterogeneous environmental conditions (e.g., reverberation), and content complexities such as mispronunciations. Extensive experiments reveal the limitations of current LLM-based voice assistant models and offer valuable insights for future research and development in this field.1

  • Research Article
  • 10.1002/ohn.70193
Patient Perceived Experience of Automated Voice Assistants in Oral Cavity Cancer.
  • Mar 10, 2026
  • Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery
  • Hannah Baratz + 11 more

Patient Perceived Experience of Automated Voice Assistants in Oral Cavity Cancer.

  • Research Article
  • 10.3390/electronics15010239
AirSpeech: Lightweight Speech Synthesis Framework for Home Intelligent Space Service Robots
  • Jan 5, 2026
  • Electronics
  • Xiugong Qin + 5 more

Text-to-Speech (TTS) methods typically employ a sequential approach with an Acoustic Model (AM) and a vocoder, using a Mel spectrogram as an intermediate representation. However, in home environments, TTS systems often struggle with issues such as inadequate robustness against environmental noise and limited adaptability to diverse speaker characteristics. The quality of the Mel spectrogram directly affects the performance of TTS systems, yet existing methods overlook the potential of enhancing Mel spectrogram quality through more comprehensive speech features. To address the complex acoustic characteristics of home environments, this paper introduces AirSpeech, a post-processing model for Mel-spectrogram synthesis. We adopt a Generative Adversarial Network (GAN) to improve the accuracy of Mel spectrogram prediction and enhance the expressiveness of synthesized speech. By incorporating additional conditioning extracted from synthesized audio using specified speech feature parameters, our method significantly enhances the expressiveness and emotional adaptability of synthesized speech in home environments. Furthermore, we propose a global normalization strategy to stabilize the GAN training process. Through extensive evaluations, we demonstrate that the proposed method significantly improves the signal quality and naturalness of synthesized speech, providing a more user-friendly speech interaction solution for smart home applications.

  • Research Article
  • 10.1080/19463014.2025.2576925
Oral interaction patterns in work-related activities in teaching and interviews for language practice placements: L2 teaching for adults in Sweden
  • Dec 8, 2025
  • Classroom Discourse
  • Robert Walldén + 1 more

ABSTRACT This study explores oral interaction in work-related activities within adult second-language (L2) education in Sweden, specifically in the context of Swedish for Immigrants (SFI). Drawing on data from classroom instruction and placement interviews, the study investigates which speech functions and speaker roles are made available to students preparing for workplace language practice. Transcribed material from one lesson and four placement interviews were analysed using speech act theory, with a focus on speech functions, speaker roles, and opportunities for extended talk. The findings show that while interaction was often regulated by teachers and recruiters, students actively contributed through humour, storytelling, meta-talk, and expressions of preference or reservation. Students assumed diverse speaker roles, including novice, knower, peer, and tension releaser, thereby engaging in pragmatically and socially relevant communication. These roles and functions are important for both general language development and workplace interaction. The study highlights the need to create space in L2 classrooms for less typical but crucial functions – such as turn-taking, declining, and expressing disagreement – often underrepresented in classroom discourse. It contributes to research on adult L2 education by demonstrating how structured interaction around work placements can support learners’ oral proficiency and pragmatic competence.

  • PDF Download Icon
  • Research Article
  • 10.1038/s41597-025-06020-6
A richly annotated dataset of co-speech hand gestures across diverse speaker contexts
  • Nov 5, 2025
  • Scientific Data
  • Laura B Hensel + 2 more

Hand gestures form an integral part of human communication and their complexity makes their study and generation difficult. Here, we present a dataset comprising 2373 annotated gestures, designed to facilitate in-depth analysis of human communication. We captured these gestures from nine speakers across three distinct categories: University lecturers, Politicians, and Psychotherapists. The annotations encompass various aspects, including gesture types (e.g., metaphoric, iconic), descriptive terms characterizing each gesture (e.g., ‘sweep’, ‘container’), and their corresponding verbal utterances. The dataset also includes detailed physical properties such as hand height, distance to the body, arm angle, hand configuration, palm orientation, repetitions, size, and speed, alongside 3D pose tracking data. Where possible, video recordings provide additional multimodal context. Notably, we identified several previously undocumented lexemes, expanding the current lexicon of gesture research. This dataset offers a valuable resource for studying human communication, training models for gesture recognition and generation, and designing socially intelligent virtual agents.

  • Research Article
  • 10.21015/vtse.v13i3.2174
Vocal Sentiments: Transformer Based Speech Emotion Recognition
  • Sep 27, 2025
  • VFAST Transactions on Software Engineering
  • Didar Ali + 3 more

Speech Emotion Recognition (SER) plays a crucial role in Human–Computer Interaction (HCI) by enabling systems to interpret and respond to human emotions through speech analysis. This paper presents a Transformer-based SER framework that leverages the Wav2Vec2 model for self-supervised representation learning. Unlike conventional approaches relying on handcrafted acoustic features or shallow learning, our approach employs transfer learning to extract high-level contextual embeddings from raw audio. We integrate two benchmark datasets, RAVDESS and TESS, to improve generalization across diverse speakers and emotions, and further analyze system robustness by introducing varying levels of environmental noise. The proposed model achieves an accuracy of 79.01%, with balanced precision, recall, and F1-scores, demonstrating competitive performance compared with recent state-of-the-art SER models. The main contributions of this work are threefold: (i) a novel evaluation of Wav2Vec2 embeddings on combined RAVDESS–TESS data, (ii) a systematic assessment of noise robustness in Transformer-based SER, and (iii) a comprehensive benchmark that highlights the strengths and limitations of transfer learning in practical emotion recognition scenarios. These findings suggest broad applicability in voice assistants, call-center analytics, and mental health monitoring, while future extensions may incorporate multimodal data and advanced fine-tuning strategies to further enhance performance.

  • Research Article
  • 10.1075/eww.25018.coa
The YouTube corpus of Singapore English podcasts
  • Sep 9, 2025
  • English World-Wide
  • Steven Coats + 3 more

Abstract Recent advances in streaming protocols and automatic speech recognition (ASR) have enabled large-scale spoken language corpora, yet research on Singapore English remains constrained by small or text-based datasets. The YouTube Corpus of Singapore English Podcasts (YCSEP) addresses this gap with 620 hours of transcribed, diarized speech from over 1,300 podcast episodes by Singapore-based content creators. YCSEP supports the empirical analysis of phonetics, morphosyntax, and discourse, enabling the study of low-frequency features like discourse particles and reduplication. The dataset reflects informal, spontaneous speech from diverse speakers and facilitates investigation into nativization and endonormative stabilization processes in postcolonial English. Built using a pipeline of yt-dlp, WhisperX, and Pyannote, YCSEP offers robust empirical grounding for linguistic features such as verb complementation and modality. It also contributes to broader theoretical discussions on areal norms and construction grammar in World Englishes.

  • Research Article
  • 10.58851/africania.1654345
YORUBA LANGUAGE IN CYBERSPACE: ISSUES AND VIABLE OPTIONS FOR VITALITY
  • Jul 30, 2025
  • Africania
  • Julianah Akindele + 2 more

Despite its widespread offline use and vast data repository, Yorùbá remains a low-resource language, and most digital platforms do not cater to the linguistic needs of its diverse speakers. The present study examined the determinants of the hegemonic use of the English language and the continuous displacement of the Yorùbá language in digital spaces. It employed a descriptive quantitative research design and a purposive sampling technique to elicit data from 500 randomly selected netizens across the Yorùbá-speaking states of Lagos, Oyo, Ogun, Ondo, Osun, and Ekiti. The study is anchored on Kachru's (1985) Model of the three concentric circles of English. The findings indicate that while not thriving, the Yorùbá language is considered suitable to interpret technical and scientific thoughts (67.8%) but its usage is grossly deficient because social capital is associated with English, Nigeria’s most dominant foreign and participants’ professed language of upward mobility (93.4%). Overall, the study reinforces extant studies’ concerns regarding digital colonialism and shifts into a monolingual society where linguistic diversity and low-resourced languages are gravely excluded, particularly in the digitised landscape. The study concludes that sustained initiatives and user-made technological innovation, policy support, and digital engagement are crucial to dipping Yorùbá language displacement in particular and advancing the sustainability of indigenous languages in the digital age in general.

  • Research Article
  • Cite Count Icon 1
  • 10.1080/07268602.2025.2507747
Decolonizing the introductory linguistics curriculum
  • Jul 3, 2025
  • Australian Journal of Linguistics
  • Celeste Rodríguez Louro + 4 more

ABSTRACT Introductory linguistics courses sometimes bypass the voices of diverse speakers, signers and writers: members of these groups do not often speak directly to students, and contributions from diverse language scholars are often overlooked. Many learning materials also present languages as unitary and unchanging by discussing single “standard” varieties without comment. This unintended bias results in students receiving an inaccurate and socially exclusionary picture of language. In this paper, we reflect on our work to decolonize two introductory linguistics units at the University of Western Australia by extending their focus to include diverse scholarly voices. We focus on three changes recently implemented in our programme: (1) adding more work from under-acknowledged thinkers into the learning materials, thereby presenting a broader and more accurate history of linguistic thought (e.g. Pāṇini: Sanskrit; Sejong: Korean; Sequoyah: Cherokee; Bell: Yagera/Dulingbara; Zobule: Luqa/Kubokota); (2) crafting tutorial activities in which students critically examine their own definitions of “language” and who counts as a “language user” (i.e. speaker, signer or writer), while also exploring variation as a natural consequence of use and standardization as intentional; and (3) incorporating diverse teaching perspectives by inviting community members to speak directly to students (e.g. through sign language lectures delivered in Auslan and language development lectures from a practising speech pathologist). These changes seek to afford students a richer, more just introduction to language science. Furthermore, by making these changes within first-year units, we aim to strengthen the diversity and inclusiveness of our programme over the long term.

  • Research Article
  • 10.32628/ijsrst25123148
Systematic Evaluation of Deep Learning Paradigms for Speech Emotion Recognization Using Diverse Audio Sources
  • Jun 20, 2025
  • International Journal of Scientific Research in Science and Technology
  • Yogeshkumar Prajapati Degadwala + 2 more

Speech emotion identification is one of the most difficult areas of human-computer interaction, with significant ramifications for assistive technologies, customer support, and mental health monitoring. Despite significant advances in machine learning, accurately identifying emotional states from speech remains difficult due to the complex, nuanced nature of vocal emotional expressions across diverse speakers and contexts. This study presents a comprehensive evaluation of Speech Emotion Recognition (SER) systems across multiple machine learning paradigms using four benchmark datasets (CREMA-D, RAVDESS, SAVEE, and TESS). We implement a multi-feature extraction approach incorporating prosodic, spectral, and voice quality features, while employing data augmentation techniques to enhance model robustness. Our investigation spans traditional machine learning algorithms, ensemble methods, and deep learning architectures including CNN and RNN implementations. Performance evaluation reveals the superiority of the Stacking Classifier (accuracy: 72.54%, F1-score: 72.47%), with strong performances from Random Forest (68.31% accuracy) and ResNet (66% accuracy). This comparative analysis advances affective computing by providing detailed insights into the effectiveness of various approaches for emotion recognition in speech, with significant implications for developing more sophisticated emotional intelligence systems.

  • Research Article
  • 10.4300/jgme-d-24-00879.1
Implementation and Impact of a Graduate Medical Education Program Director Bootcamp.
  • Jun 1, 2025
  • Journal of graduate medical education
  • Abby L Spencer + 2 more

The Accreditation Council for Graduate Medical Education (ACGME) mandates that a single program director (PD) has the authority, accountability, and responsibility for the program and requires that PDs serve long enough to ensure program stability. High PD turnover is reported in multiple medical specialties; among internal medicine (IM) PDs in 2023, nearly half had been in the role less than 3 years. PD attrition is a key performance indicator in program accreditation, and turnover affects trainees and the sponsoring institution by contributing to program instability and poor performance. There are increasing challenges for PDs stemming from financial pressures, competing priorities, accreditation requirements, and well-being threats to faculty and trainees.While many national societies offer PD training to support a PD’s ability to be effective in their role, off-site program attendance is limited by financial pressures, restricted travel, and rapid PD turnover. Additionally, national conferences lack the local networking and acculturation to institutional policies, procedures, and priorities. Our goal was to implement a longitudinal PD bootcamp to better equip PDs to overcome GME leadership challenges, thrive in their roles, network, and lead successful, innovative, inclusive, and ACGME-compliant programs.After reviewing existing graduate medical education leadership curricula, we met with key stakeholders and developed a needs assessment for essential PD competencies. From this, we developed and implemented a comprehensive, innovative, interactive PD bootcamp, delivered in-person monthly by diverse speakers with expertise from procedural, non-procedural, residency, and fellowship programs, as well as from medical school and hospital legal departments, human resources, and senior administration. We evaluated each session as well as overall course outcomes via anonymous QR code post-session questionnaires. Participants were also asked to complete an accountability notecard setting 1 to 2 goals that they wished to work toward, based on what they learned from PD bootcamp. Notecards were collected at program graduation and shared with participants at the 6-month mark after completing bootcamp. Participants were asked to share their progress toward goals since bootcamp graduation.We successfully developed and implemented an 11-month longitudinal PD bootcamp that launched in June 2022 and is now in its third successful year. Each monthly session included 1 to 2 core curriculum topics from 1:00 to 4:30 pm. Our first year, registration for the course reached capacity within an hour of opening enrollment, and the program has filled to capacity all 3 years. Since its inception, 75 PDs and aspiring PDs/associate PDs (APDs) registered for the course, and more than 95% of attendees successfully earned their certificate. Of those completing our pre-post surveys, 61% (17 of 28) of participating PDs had been in their role for less than 3 years, and 25% (7 of 28) had been in the role less than 1 year; 18% (5 of 28) of respondents were APDs. Twenty different medical/surgical specialties were represented. Sixty-four percent (28 of 64) of participants completed their post-course evaluation. One hundred percent of respondents enjoyed the course, made new connections, and found it useful. Retrospective pre/post surveys revealed that all participants felt they at least somewhat improved their knowledge/skills/confidence across all topics covered. Participants said the greatest improvement was thought to be in delivering feedback, remediation, building relationships, knowing who to call when, recruitment, and using annual program evaluation for program improvement. Approximately 50% (18 of 36) of the first cohort and 39% (11 of 28) of the second cohort completed and submitted accountability notecards at the time of graduation. The top 3 areas for which participants set goals were improving feedback to trainees, restructuring evaluations of trainees, and engaging the program evaluation committee in the annual program evaluation process.We hope this impact enhances the quality of our training programs, the educational experience of our trainees, and the recruitment and retention of outstanding PDs and APDs. Next steps include tracking implementation of goals set during the course and continuous course improvement to meet PD and institutional needs. We believe our curriculum and structure can be easily implemented at other institutions to support their PD development, create community, and increase ability to adapt to an ever-changing and critically important leadership role.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 1
  • 10.1038/s41597-025-05267-3
Human voices communicating trustworthy intent: A demographically diverse speech audio dataset
  • May 31, 2025
  • Scientific Data
  • Constantina Maltezou-Papastylianou + 2 more

The multi-disciplinary field of voice perception and trustworthiness lacks accessible and diverse speech audio datasets representing diverse speaker demographics, including age, ethnicity, and sex. Existing datasets primarily feature white, younger adult speakers, limiting generalisability. This paper introduces a novel open-access speech audio dataset with 1,152 utterances from 96 untrained speakers, across white, black and south Asian backgrounds, divided into younger (N = 60, ages 18–45) and older (N = 36, ages 60+) adults. Each speaker recorded both, their natural speech patterns (i.e. “neutral” or no intent), and their attempt to convey their trustworthy intent as they perceive it during speech production. Our dataset is described and evaluated through classification methods between neutral and trustworthy speech. Specifically, extracted acoustic and voice quality features were analysed using linear and non-linear classification models, achieving accuracies of around 70%. This dataset aims to close a crucial gap in the existing literature and provide additional research opportunities that can contribute to the generalisability and applicability of future research results in this field.

  • Research Article
  • Cite Count Icon 1
  • 10.3390/languages10050098
On “Local Theory” Neutrality with Respect to “Meta-Theories” and Data from a Diversity of “Native Speakers”, Including Heritage Speaker Bilinguals: Commentary on Hulstijn (2024)
  • Apr 30, 2025
  • Languages
  • Jason Rothman + 3 more

This commentary critically engages with Hulstijn’s revised Basic Language Cognition (BLC) Theory, which aims to enhance explanatory power and falsifiability regarding individual differences (IDs) in language proficiency across native and non-native speakers. While commending BLC Theory’s emphasis on separating oral and written language cognition, we raise two key concerns. First, we question the theory’s exclusive alignment with usage-based approaches, arguing that its core constructs are, in principle, compatible with multiple meta-theoretical frameworks, including generative ones. As such, BLC Theory should remain neutral to maximize its cross-paradigmatic utility. Second, we address the theory’s treatment of heritage speaker bilinguals (HSs), particularly the implication that they may not typically acquire BLC. We contend that this position overlooks robust empirical evidence demonstrating that HSs develop systematic, rule-governed grammars influenced by their individual input and usage conditions. Moreover, we highlight how IDs among HSs can provide a valuable testing ground for BLC Theory, particularly regarding the role of input and literacy. We conclude that embracing theory neutrality and integrating diverse speaker data—especially from heritage bilinguals—can enhance BLC Theory’s generalizability, empirical relevance, and theoretical utility across language acquisition research.

  • Research Article
  • Cite Count Icon 8
  • 10.3390/s25082366
A Novel Approach for Visual Speech Recognition Using the Partition-Time Masking and Swin Transformer 3D Convolutional Model.
  • Apr 8, 2025
  • Sensors (Basel, Switzerland)
  • Xiangliang Zhang + 6 more

Visual speech recognition is a technology that relies on visual information, offering unique advantages in noisy environments or when communicating with individuals with speech impairments. However, this technology still faces challenges, such as limited generalization ability due to different speech habits, high recognition error rates caused by confusable phonemes, and difficulties adapting to complex lighting conditions and facial occlusions. This paper proposes a lip reading data augmentation method-Partition-Time Masking (PTM)-to address these challenges and improve lip reading models' performance and generalization ability. Applying nonlinear transformations to the training data enhances the model's generalization ability when handling diverse speakers and environmental conditions. A lip-reading recognition model architecture, Swin Transformer and 3D Convolution (ST3D), was designed to overcome the limitations of traditional lip-reading models that use ResNet-based front-end feature extraction networks. By adopting a strategy that combines Swin Transformer and 3D convolution, the proposed model enhances performance. To validate the effectiveness of the Partition-Time Masking data augmentation method, experiments were conducted on the LRW video dataset using the DC-TCN model, achieving a peak accuracy of 92.15%. The ST3D model was validated on the LRW and LRW1000 video datasets, achieving a maximum accuracy of 56.1% on the LRW1000 dataset and 91.8% on the LRW dataset, outperforming current mainstream lip reading models and demonstrating superior performance on challenging easily confused samples.

  • Research Article
  • 10.54327/set2025/v5.i1.224
Fine-tuning AraGPT2 for Hierarchical Arabic Text Classification
  • Mar 17, 2025
  • Science, Engineering and Technology
  • Djelloul Bouchiha + 3 more

Text classification consists in attributing a text to its corresponding category. It is a crucial task in natural language processing (NLP), with applications spanning content recommendation, spam detection, sentiment analysis, and topic categorization. While significant advancements have been made in text classification for widely spoken languages, Arabic remains underrepresented despite its large and diverse speaker base. Another challenge is that, unlike flat classification, hierarchical text classification involves categorizing texts into a multi-level taxonomy. This adds layers of complexity, particularly in distinguishing between closely related categories within the same super-class. To tackle these challenges, we propose a novel approach using AraGPT2, a variant of the Generative Pre-trained Transformer 2 (GPT-2) model adapted specifically for Arabic. Fine-tuning AraGPT2 for hierarchical text classification leverages the model's pre-existing linguistic knowledge and adapts it to recognize and classify Arabic text according to hierarchical structures. Fine-tuning, in this context, refers to the process of training a pre-trained model on a specific task or dataset to improve its performance on that task. Our experiments and comparative study demonstrate the efficiency of our solution. The fine-tuned AraGPT2 classifier achieves a hierarchical HF score of 80.64%, outperforming the machine learning-based classifier, which scores 41.90%.

  • Research Article
  • Cite Count Icon 2
  • 10.54097/1f2j6n73
The Role of Social Media in Informal English Learning: A Case Study of Language Learning Communities
  • Feb 14, 2025
  • International Journal of Education and Humanities
  • Wenting Zhu

This paper explores the role of social media in informal English learning, with a focus on language learning communities. In the digital era, social media has transformed the landscape of language acquisition. Traditional learning methods are being augmented by the vast opportunities social media offers.Using a case - study approach, data is collected from multiple language learning communities on platforms like Facebook, Instagram, and WhatsApp. The research combines literature review, online community observation, and surveys/interviews. Results indicate that social media significantly promotes informal English learning. It provides a global stage for learners to interact with diverse speakers, offers instant feedback, and enriches learning through various content forms.Language learning communities on social media foster high - level interactivity, enhancing learners' motivation and language skills. They also facilitate cultural exchange, which is essential for a comprehensive understanding of the language. However, challenges such as unstructured learning and misinformation exist. Overall, social media holds great potential for informal English learning, and future research should focus on optimizing its use.

  • Research Article
  • Cite Count Icon 13
  • 10.1177/20539517241303118
Toward cultural interpretability: A linguistic anthropological framework for describing and evaluating large language models
  • Jan 29, 2025
  • Big Data & Society
  • Graham M Jones + 2 more

This article proposes a new integration of linguistic anthropology and machine learning (ML) around convergent interests in both the underpinnings of language and making language technologies more socially responsible. While linguistic anthropology focuses on interpreting the cultural basis for human language use, the ML field of interpretability is concerned with uncovering the patterns that Large Language Models (LLMs) learn from human verbal behavior. Through the analysis of a conversation between a human user and an LLM-powered chatbot, we demonstrate the theoretical feasibility of a new, conjoint field of inquiry, cultural interpretability (CI). By focusing attention on the communicative competence involved in the way human users and AI chatbots coproduce meaning in the articulatory interface of human-computer interaction, CI emphasizes how the dynamic relationship between language and culture makes contextually sensitive, open-ended conversation possible. We suggest that, by examining how LLMs internally “represent” relationships between language and culture, CI can: (1) provide insight into long-standing linguistic anthropological questions about the patterning of those relationships; and (2) aid model developers and interface designers in improving value alignment between language models and stylistically diverse speakers and culturally diverse speech communities. Our discussion proposes three critical research axes: relativity, variation, and indexicality.

  • Research Article
  • Cite Count Icon 17
  • 10.1016/j.ejrad.2024.111827
Multilingual feasibility of GPT-4o for automated Voice-to-Text CT and MRI report transcription.
  • Jan 1, 2025
  • European journal of radiology
  • Felix Busch + 6 more

Large language models (LLMs) promise to streamline radiology reporting. With the release of OpenAI's GPT-4o (Generative Pre-trained Transformers-4 omni), which processes not only text but also speech, multimodal LLMs might now also be used as medical speech recognition software for radiology reporting in multiple languages. This proof-of-concept study investigates the feasibility of using GPT-4o for automated voice-to-text transcription of radiology reports in English and German. Three readers with varying levels of experience each dictated 100 synthetic radiology reports in both languages using GPT-4o via the ChatGPT iOS mobile application. Reports included CT and MRI scans of various anatomical regions. Evaluation metrics included error type, severity, and correction time. BERTScore and ROUGE metrics were calculated to assess semantic similarity and n-gram overlap between dictated and original reports. No significant differences in correction time between languages were found, but differences were observed between readers based on experience. Error rates were similar for both languages, with most errors being minor (92.68%, n=114/123 German; 94.74%, n=90/95 English) and technical (27.04%, n=43/159 German; 35.65%, n=41/115 English) or typographical (23.9%, n=38/159 German; 27.83%, n=32/115 English). BERTScore metrics were significantly higher for German, while ROUGE metrics showed no significant differences between languages. This study demonstrates the potential of GPT-4o for multilingual transcription of radiology reports, effectively handling both English and German with minimal errors and high semantic understanding. Future research should compare GPT-4o with current radiology dictation tools, assessing performance, cost-effectiveness, and multilingual capabilities across diverse speaker populations.

  • Research Article
  • 10.22232/stj.2025.13.01.18
Analysis of Lip Reading of Assamese Digits using Deep Learning
  • Jan 1, 2025
  • Science & Technology Journal
  • Rabinder Kumar Prasad + 5 more

Effective communication in noisy environments, such as aviation, construction, and manufacturing, is often hindered due to auditory challenges, making oral communication difficult. To address this issue, we propose an automatic lip-reading system specifically designed for recognizing Assamese digits in high-noise settings. This study introduces a deep learning-based approach that extracts the geometric features of lip movements from video data to accurately predict spoken digits. Traditional lip-reading models struggle with language-specific nuances due to reliance on generic datasets. To overcome this limitation, we construct a custom dataset of video recordings featuring diverse speakers varying in age, gender, and accent, ensuring a more robust and adaptable model. We employ a CNN+LSTM architecture, where Convolutional Neural Networks (CNNs) capture spatial features, and Long Short-Term Memory (LSTM) networks learn temporal dependencies. Experimental results demonstrate that our CNN+LSTM model outperforms conventional architectures like RNN+LSTM and RNN+CNN, achieving an accuracy of 83%. The findings highlight the effectiveness of deep learning in enhancing accessibility for the deaf and hard-of-hearing and enabling voice-free human-computer interaction.

  • Research Article
  • 10.55393/babylonia.v3i.391
Becoming bilingual when access to the minority language may be compromised
  • Dec 12, 2024
  • Babylonia Journal of Language Education
  • Virginia C Mueller Gathercole

Between September and December 2023, Babylonia collected questions from parents regarding their children's language development. This article aims to answer the following questions: We want our daughter to be fully bilingual - with such a high dominion of each language that people question whether she speaks any other language at all. Both my husband and I speak Spanish and English in this way, having grown up in Mexico going to an English-speaking school and then moving to the US for university and the rest of our adult lives. The actual question: how can we recreate this for our daughter, knowing that she is in the US and will not be immersed in Spanish the way we were when growing up. Plus finding Spanish-speaking child care is hard—- is two days on the weekend and evenings in Spanish enough to have her be bilingual? What would you recommend we do so that we set her up for success in both languages? She is 8 months today. I am a non-heritage speaker of another language (Spanish). I can speak fluidly but still make errors that native speakers do not make. My husband and I would like our daughter - currently 2 months old - to be fluent in the second language (Spanish) and plan to enroll her in a bilingual learning environment once she is old enough. In the meantime we join a once per week bilingual storytime, and try to read her stories in Spanish at home. My question is: for language exposure & acquisition, is it better for me to try to speak Spanish to her at home if my Spanish has errors, or just wait, stick to small exposures for now, and let her learning come primarily once she has started daycare/preschool? [Summary generated by Poe - we refer the reader to the full article in PDF format for a complete answer] This article discusses the challenges of bilingual development in children particularly in contexts where access to a minority language, such as Spanish in an English majority context, may be compromised. Interaction with fluent speakers is essential for language learning. It is beneficial for parents to speak the minority language at home and create an environment in which its use feels natural. Regular exposure, even if limited, supports the development of language skills, and exposure to varied contexts and diverse speakers is encouraged. The author recommends access to the minority language through interactions with other adults and children who speak that language, and continued use of this language even once the child starts school. Additionally, enrolling the child in a bilingual school or a school where the minority language is used as a medium of instruction can be beneficial. Finally, she notes thatparents play a key role in creating a rich language environment and promoting a positive language learning experience. In summary, the author recommends that parents: Use the minority language at home and create an environment in which it is natural for the child to use it. Expose the child to other adults and children who speak the minority language. Continue to use and expose the child to the minority language once they start school. Consider enrolling the child in a bilingual school or a school where the minority language is used as a medium of instruction. Don't give up, as there may be occasional ups and downs in the child's uptake of the language.

  • 1
  • 2
  • 3
  • 4
  • 1
  • 2
  • 3
  • 4

Popular topics

  • Latest Artificial Intelligence papers
  • Latest Nursing papers
  • Latest Psychology Research papers
  • Latest Sociology Research papers
  • Latest Business Research papers
  • Latest Marketing Research papers
  • Latest Social Research papers
  • Latest Education Research papers
  • Latest Accounting Research papers
  • Latest Mental Health papers
  • Latest Economics papers
  • Latest Education Research papers
  • Latest Climate Change Research papers
  • Latest Mathematics Research papers

Most cited papers

  • Most cited Artificial Intelligence papers
  • Most cited Nursing papers
  • Most cited Psychology Research papers
  • Most cited Sociology Research papers
  • Most cited Business Research papers
  • Most cited Marketing Research papers
  • Most cited Social Research papers
  • Most cited Education Research papers
  • Most cited Accounting Research papers
  • Most cited Mental Health papers
  • Most cited Economics papers
  • Most cited Education Research papers
  • Most cited Climate Change Research papers
  • Most cited Mathematics Research papers

Latest papers from journals

  • Scientific Reports latest papers
  • PLOS ONE latest papers
  • Journal of Clinical Oncology latest papers
  • Nature Communications latest papers
  • BMC Geriatrics latest papers
  • Science of The Total Environment latest papers
  • Medical Physics latest papers
  • Cureus latest papers
  • Cancer Research latest papers
  • Chemosphere latest papers
  • International Journal of Advanced Research in Science latest papers
  • Communication and Technology latest papers

Latest papers from institutions

  • Latest research from French National Centre for Scientific Research
  • Latest research from Chinese Academy of Sciences
  • Latest research from Harvard University
  • Latest research from University of Toronto
  • Latest research from University of Michigan
  • Latest research from University College London
  • Latest research from Stanford University
  • Latest research from The University of Tokyo
  • Latest research from Johns Hopkins University
  • Latest research from University of Washington
  • Latest research from University of Oxford
  • Latest research from University of Cambridge

Popular Collections

  • Research on Reduced Inequalities
  • Research on No Poverty
  • Research on Gender Equality
  • Research on Peace Justice & Strong Institutions
  • Research on Affordable & Clean Energy
  • Research on Quality Education
  • Research on Clean Water & Sanitation
  • Research on COVID-19
  • Research on Monkeypox
  • Research on Medical Specialties
  • Research on Climate Justice
Discovery logo
FacebookTwitterLinkedinInstagram

Download the FREE App

  • Play store Link
  • App store Link
  • Scan QR code to download FREE App

    Scan to download FREE App

  • Google PlayApp Store
FacebookTwitterTwitterInstagram
  • Universities & Institutions
  • Publishers
  • R Discovery PrimeNew
  • Ask R Discovery
  • Blog
  • Accessibility
  • Topics
  • Journals
  • Open Access Papers
  • Year-wise Publications
  • Recently published papers
  • Pre prints
  • Questions
  • FAQs
  • Contact us
Lead the way for us

Your insights are needed to transform us into a better research content provider for researchers.

Share your feedback here.

FacebookTwitterLinkedinInstagram
Cactus Communications logo

Copyright 2026 Cactus Communications. All rights reserved.

Privacy PolicyCookies PolicyTerms of UseCareers