Articles published on Voice Conversion
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
699 Search results
Sort by Recency
- New
- Research Article
- 10.52589/ajlpra-yhquaffv
- Feb 11, 2026
- African Journal of Law Political Research and Administration
- Ezeugwu, F O + 2 more
Music has long served as a powerful medium for shaping public awareness and responding to social realities in Nigeria. Within this tradition, Onyeka Onwenu, an influential singer, journalist, and advocate known for her socially conscious artistry, emerges as a key voice in national conversations on unity and justice. This study examines how her song One Love communicates social consciousness and political advocacy within Nigeria’s socio‑political climate. Using a qualitative research design, the study undertakes a multi‑layered analysis combining semiotic examination of the song’s lyrics and music video with cultural interpretation of its public reception in media and scholarship. Semiotic analysis is used to identify the musical, visual, and textual signs through which meaning is encoded, while cultural interpretation situates these signs within Nigeria’s historical and political context. The study also draws on secondary materials to deepen contextual understanding and triangulate interpretations. The analysis is guided by Cultural Studies Theory, which explains how the song reflects and challenges dominant social narratives; Semiotic Musicology Theory, which interprets the symbolic and expressive codes embedded in the music; and Social Movement Theory, which situates One Love within broader traditions of musical activism and collective mobilisation. Findings show that One Love functions as a cultural text that critiques social division, promotes unity, and encourages civic responsibility. The study also highlights Onwenu’s role as a pioneering female artist whose work challenges patriarchal and political boundaries. Overall, the research demonstrates the enduring power of music to shape social consciousness and inspire collective action in contemporary Nigeria.
- Research Article
- 10.51583/ijltemas.2025.1412000140
- Jan 19, 2026
- International Journal of Latest Technology in Engineering Management & Applied Science
- Tanmay Mehta
Lead leakage is defined as the loss of potential patient inquiries due to delayed or absent response mechanisms and represents a significant revenue challenge for independent medical practices. This paper presents a hybrid AI-automation framework that integrates WhatsApp Business API with conversational voice AI agents to minimize response latency and improve inquiry-to-appointment conversion rates. The proposed system employs natural language understanding (NLU) for intent classification, automated acknowledgment protocols and intelligent call routing to address the temporal gaps in traditional receptionist-based workflows. Deployment across three dental practices in India demonstrated a 47% reduction in inquiry abandonment, 89% decrease in mean response time (from 2.1 hours to 7 seconds for asynchronous channels), and projected revenue recovery of ₹8.4 lakhs per practice annually. The framework's modular architecture enables adaptation across medical specialties while maintaining data privacy standards compliant with Indian regulations.
- Research Article
- 10.1007/s10772-025-10232-x
- Jan 14, 2026
- International Journal of Speech Technology
- Selvan Chinnaiyan + 3 more
Optimized graph convolutional shunted self-attention neural network for multilingual speech-to-text training using cross-language voice conversion of speech representations
- Research Article
- 10.18833/spur/9/2/3
- Jan 1, 2026
- Scholarship and Practice of Undergraduate Research
- Svetlana Korolev Kapolka
This classic book, now in its fifth edition, is a pleasure to read. It is a helpful companion to the academic writing process, including planning for a research project, making good arguments, and telling the story clearly and ethically. Written in a conversational voice, this guide includes helpful analogies and direct prompts for readers to practice their skills, as well as templates. It is part of the renowned Chicago Guides to Writing, Editing, and Publishing series. The authors of the book state that “so far as we know, no other guide gives the same balanced attention to the processes of research, argumentation, and communication, as well as to how these processes influence each other.”
- Research Article
- 10.1016/j.csl.2025.101853
- Jan 1, 2026
- Computer Speech & Language
- Ashwini Dasare + 1 more
Performance assessment of voice conversion models using speech production-based parameters
- Research Article
- 10.25073/2588-1086/vnucsce.6492
- Dec 21, 2025
- VNU Journal of Science: Computer Science and Communication Engineering
- Phuong Tuan Dat + 2 more
The Vietnamese Spoofing-Aware Speaker Verification (VSASV) Challenge series represents the first systematic effort to advance spoof-resistant speaker verification for Vietnamese - alow-resource, highly tonal language characterized by rich phonetic variability. Unlike prior challenges focused on English, VSASV directly addresses the scarcity of publicly available Vietnamesespoofing corpora, a limitation that historically hindered the development of robust automatic speakerverification (ASV) and spoofing countermeasure (CM) systems. Across its 2023 and 2025 editions,VSASV introduces progressively more challenging benchmarks, including multi-corpus bonafidespeech, replay attacks, neural voice conversion, modern TTS synthesis, and adversarial perturbations. The 2025 edition further incorporates a speaker-similarity-based partitioning strategy andsevere train–test mismatches to emulate realistic attack scenarios. Results from more than 40 participating systems highlight the feasibility of building reliable spoofing-aware ASV pipelines underlow-resource conditions, particularly when combining ASV and CM subsystems or leveraging multilingual self-supervised learning (SSL) models. The findings underscore the importance of linguisticproperties - especially tonal dynamics - in shaping spoofing vulnerabilities and model generalization.This work provides a comprehensive overview of the VSASV challenge series, synthesizing insightsthat inform future research on deepfake detection, spoof-robust speech authentication, and inclusivebiometric technologies for underrepresented languages.Keywords: Deepfake Detection, Speaker Verification, Low-resource Languages, VietnameseSpeech Datasets
- Research Article
- 10.3390/s25247608
- Dec 15, 2025
- Sensors (Basel, Switzerland)
- Jinzi Li + 6 more
With the rapid advancement of speech synthesis and voice conversion technologies, audio deepfake techniques have posed serious threats to information security. Existing detection methods often lack robustness when confronted with environmental noise, signal compression, and ambiguous fake features, making it difficult to effectively identify highly concealed fake audio. To address this issue, this paper proposes a Dual-Path Time-Frequency Attention Network (DPTFAN) based on Pythagorean Hesitant Fuzzy Sets (PHFS), which dynamically characterizes the reliability and ambiguity of fake features through uncertainty modeling. It introduces a dual-path attention mechanism in both time and frequency domains to enhance feature representation and discriminative capability. Additionally, a Lightweight Fuzzy Branch Network (LFBN) is designed to achieve explicit enhancement of ambiguous features, improving performance while maintaining computational efficiency. On the ASVspoof 2019 LA dataset, the proposed method achieves an accuracy of 98.94%, and on the FoR (Fake or Real) dataset, it reaches an accuracy of 99.40%, significantly outperforming existing mainstream methods and demonstrating excellent detection performance and robustness.
- Research Article
- 10.54656/4rq8f233
- Dec 8, 2025
- Journal of Community Engagement and Scholarship
- Stephanie Anne Shelton + 2 more
This qualitative ethnodrama-based study explores the perspectives of four juvenile justice facility leaders on the role and impact of community partnerships in supporting justice-involved youth in restrictive settings. Despite existing literature on the benefits of community partnerships for youth, limited attention has been paid to their function within juvenile justice. Framed by an adaptation of Bakhtin’s theory of dialogue and presented through ethnodrama, this study centers the experiences and perspectives of staff members working in diverse juvenile facilities across the U.S. Participants’ reflections, gathered through asynchronous, open-ended written dialogue, emphasize how meaningful partnerships with schools, local organizations, and community-based agencies enhance educational opportunities, behavioral outcomes, and facility climates. Simultaneously, participants identify persistent challenges that limit such partnerships’ effectiveness and consistency. This study underscores the necessity of sustained, adaptive, and mutually respectful community engagement. By presenting staff voices in conversation, this work highlights how partnerships—when thoughtful and contextually aware—can foster hope, promote youth agency, and support both youth and staff within systems often marked by marginalization. This research contributes to the sparse scholarship focused on juvenile justice educators and offers implications for policy, practice, and partnership design that better serve youth in restrictive settings.
- Research Article
1
- 10.48084/etasr.13400
- Dec 8, 2025
- Engineering, Technology & Applied Science Research
- Ali Osman Mohammed Salih + 5 more
As voice authentication systems become increasingly integral to critical domains such as banking, smart assistants, and remote identity verification, they face escalating threats from AI-generated audio, commonly referred to as deepfakes. These synthetic voices, produced through advanced text-to-speech and voice conversion technologies, can convincingly imitate human speech, thereby undermining the reliability and security of authentication frameworks. This study provides a comprehensive review of spectral-based techniques for deepfake audio detection, highlighting the roles of spectrograms, Mel-Frequency Cepstral Coefficients (MFCC), and Constant-Q Transform (CQT) in exposing time-frequency anomalies. The integration of Convolutional Neural Network (CNN)-based spoof detection modules before identity verification is identified as a critical architectural strategy to enhance system resilience. This review also outlines the prevailing challenges, including vulnerability due to emerging generative models, limited interpretability of deep learning classifiers, and decreased robustness under realistic or noisy conditions. To advance the field, this study emphasizes promising research directions such as hybrid modeling approaches, adversarial training techniques, and the development of multilingual open-access deepfake audio datasets. By critically synthesizing existing research, this review aims to inform the design of more robust, generalizable, and transparent voice authentication systems capable of surviving the evolving landscape of audio-based threats.
- Research Article
- 10.1186/s42400-025-00490-2
- Dec 1, 2025
- Cybersecurity
- Ruixin Song + 4 more
Abstract Adversarial attacks on speaker identification (SI) systems have become a critical security concern, particularly in targeted black-box scenarios where access to the target model is limited. This paper proposes a novel framework that creates highly transferable adversarial examples. We use a voice conversion (VC) model to synthesize shadow data from a single target speech sample, which is then used to train two diverse surrogate models. Neural Tangent Kernel (NTK) theory is employed to align acoustic feature spaces, while mutual information optimization enforces consistency between the surrogate models’ predictions. Consequently, the adversarial attack is formulated as a min-max game that maximizes attack success while preserving speech quality. Extensive experiments on LibriSpeech and VCTK datasets demonstrate that our method significantly improves the transferability and effectiveness of adversarial examples compared to conventional approaches. Our findings suggest that generating shadow data through voice conversion followed by surrogate model training under information-theoretic constraints is a promising strategy for robust adversarial attacks.
- Abstract
- 10.1002/alz70856_101171
- Dec 1, 2025
- Alzheimer's & Dementia
- James Glass + 15 more
BackgroundAnalysis of digital voice (dVoice) is emerging as an inclusive approach to detecting the earliest preclinical symptoms of Alzheimer's disease (AD) and related dementias (ADRD) because of the widespread penetration of recording devices, such as the smartphone. Speaking is a cognitive complex task and includes concomitantly embedded neuropsychiatric related features. However, the promise of digital voice is impeded by inherent personal identifying information (PII) in the voice print and the lack of automated processing tools to extract AD/ADRD features of interest. The digital workgroup of the Global Research and Imaging Platform (GRIP) is developing open‐source tools to remove these barriers to unleash dVoice's scientific potential.MethodLeveraging >33,000 longitudinal digital voice recordings collected from the Framingham Heart Study (FHS) participants using different fidelity recording devices between 2005‐current, we have developed and tested 1) a fictitious voice conversion (FVC) method that masks the original voice print while preserving audio features, 2) a suite of automated audio, linguistic and paralinguistic feature extraction tools 3) a natural language processing (NLP) framework to splice PII from dVoice transcripts and 4) privacy protecting (PP) analysis pipeline for AI driven‐analysis.ResultWe've applied the FVC method to 92 FHS recordings, ADReSS, and LibriTTS. Using version 1 of our automated feature extraction tools, we extracted acoustic, linguistic and paralinguistic features in all FHS dVoice recordings. We've applied the NLP‐PII framework to >350 manual transcriptions of FHS dVoice recordings that include marked PII. We've analyzed 128 FHS recordings, DementiaBank (Delaware), and ADReSS with the PP‐AI analytic approach. These tools have been released by GRIP in its modularly organized system, allowing users to select those that are relevant to their dVoice workflows. FVC preservation of audio features that cannot be reversed engineered did not reach sufficient levels of analysis comparability compared to dVoice recordings in their native format.ConclusionThe GRIP v1 release of dVoice processing tools are sufficiently robust to be used on recordings with varying levels of fidelity. The availability of these tools facilitates studies using voice recordings to explore their utility for measuring cognitive and behavioral symptoms of early AD/ADRD, including during the preclinical stage.
- Research Article
- 10.1093/geroni/igaf122.1743
- Dec 1, 2025
- Innovation in Aging
- Othelia Lee + 2 more
Abstract Objectives The digital divide and limited digital literacy hinder technology adoption among older adults in low-resource communities. This study examines how socially assistive robots (SARs) can facilitate social engagement and emotional well-being among older adults by analyzing conversational interactions with the AI-driven Hyodol SAR. Methods Multimodal data—including log-based usage patterns and conversational voice data—were collected through SAR-embedded sensors. Pre- and post-surveys gathered information on demographics and health status. Human-robot conversations were categorized into nine emotional and topical categories and six types of activity participation. To explore user engagement patterns, K-means clustering was applied to identify distinct user personas. Results Among participants, 44.6% engaged in conversations with the SARs, with 30.2% discussing their activity participation. Three personas emerged: Social Butterflies (n = 19, 28.35%) maintained balanced engagement in social and personal activities, with positive emotional exchanges but limited long-term impact on well-being. Lone Wolves (n = 28, 41.79%) had low social engagement, yet showed notable improvements in emotional well-being through conversational interactions with the SAR. Emotional Peacocks (n = 20, 29.85%) displayed high emotional and sensory engagement with the SAR, demonstrating the greatest reduction in loneliness among the three groups. Discussion Findings suggest that SARs helped mitigate social isolation, especially among older adults with limited social engagement. Furthermore, the integration of large language models with SAR technology enables autonomous, dialogue-based AI companions delivering personalized, emotion-sensitive interactions. Future research should explore adaptive AI learning models, ethical considerations in caregiving, and the long-term effects of SAR-facilitated engagement on well-being.
- Research Article
- 10.1016/j.jvoice.2025.10.018
- Nov 1, 2025
- Journal of voice : official journal of the Voice Foundation
- Serena Pu + 4 more
Exploring Voice Banking as an Alternative Augmentative Communication Strategy for Individuals with Dysphonia, Aphonia, and Dysarthria: A Scoping Review.
- Research Article
- 10.1016/j.jvoice.2025.09.027
- Nov 1, 2025
- Journal of voice : official journal of the Voice Foundation
- Elizabeth U Grillo
Comparison of the Global Voice Prevention and Therapy Model with the Estill Voice Model and Conversation Training Therapy in Professional and Student Teachers.
- Research Article
- 10.1186/s13636-025-00422-5
- Oct 27, 2025
- EURASIP Journal on Audio, Speech, and Music Processing
- Nayereh Seyed Afiuny + 1 more
ICRCycleGAN-VC: a robust one-to-one voice conversion method based on CycleGAN and inception-ResNet blocks
- Research Article
- 10.1093/ptj/pzaf125
- Oct 16, 2025
- Physical therapy
- Gregory W Hartley + 4 more
While health systems science (HSS) is now recognized as a foundational pillar in medical education, the profession of physical therapy has yet to fully integrate this unifying framework into its educational models. Health systems science offers a structured lens through which the profession can align its long-standing values such as patient-centered care, equity, and interprofessional collaboration, with the demands of a health care system that is complex, fragmented, and driven by accountability, data, and value. Without explicit incorporation of HSS into Doctor of Physical Therapy (DPT) curricula, the profession may have a diminished voice in critical conversations around health care equity, health system innovation, policy reform, and care redesign. This perspective presents an example from the University of Miami's DPT program, where HSS was systematically embedded across the curriculum using Kern's 6-step model for curriculum development. The process included comprehensive content mapping and intentional faculty development to promote a shared understanding of systems thinking and its relevance to physical therapist practice. As a result, DPT students are now engaged in learning that situates their clinical decision making within the broader structures, policies, and processes that shape patient outcomes at both individual and population levels. Health systems science enables physical therapists to move beyond implicit alignment with health system goals to active participation in advancing them. A physical therapist educated in HSS is positioned to contribute to population health by designing community-based interventions, participating in cross-sector partnerships, addressing social determinants of health, and applying data to reduce disparities in function and access. The framework also supports engagement in value-based care delivery, quality improvement initiatives, health informatics, and health policy development; areas central to the sustainability and evolution of health care. To remain relevant and impactful, this perspective offers a call to action for physical therapist educators to integrate HSS as a core component of professional formation and practice readiness.
- Research Article
- 10.1016/j.pmn.2025.04.003
- Oct 1, 2025
- Pain management nursing : official journal of the American Society of Pain Management Nurses
- Marcia Y Shade + 3 more
Interactive AI Routines for Pain Symptoms and Loneliness in Older Adults.
- Research Article
- 10.14419/w36qrh81
- Sep 7, 2025
- International Journal of Basic and Applied Sciences
- Rekha Rani + 1 more
The reliability of voice-based authentication has increased with the adoption of voice-controlled technologies and digital transactions. Automatic Speaker Verification (ASV) provides a dependable approach due to its special capacity to confirm identity based on speech. ASV is mostly used in telecommunications, banking, law enforcement, and smart assistants to increase security and user comfort. However, spoofing attacks like voice conversion and speech synthesis are increasingly targeting these systems, making them less compatible, examining responses to new kinds of attacks through data augmentation, and highlighting the role of transfer learning in improving detection even when there is a lack of data. This review discusses the importance of strengthening ASV systems with data augmentation to address new threats, transfer learning to enhance detection with limited data, and adaptive models to keep up with advancing spoofing attacks.
- Research Article
- 10.64719/pb.4543
- Aug 12, 2025
- Psychopharmacology Bulletin
- Courtney Barth + 2 more
Psychogenic voice disorder, often a manifestation of conversion disorder, is characterized by a sudden impairment of voice following a stressful event or other psychological cause. This case report presents a patient with a psychogenic voice disorder featuring the atypical ability to sing despite losing conversational voice. Few case reports exist on psychogenic speech and voice disorders, and no cases in the current literature examine the loss of conversational voice with preservation of singing voice. In this case, the patient experienced sudden onset stuttering which progressed to complete loss of voice in all settings while retaining the ability to sing. Despite extensive medical, psychiatric, and speech-language evaluations, including psychotherapy and speech therapy, the symptoms persisted, highlighting the diagnostic and treatment challenges in psychogenic voice disorders. This case underscores the complex interplay between psychological stressors and physical symptoms in psychogenic voice disorders, and highlights the lack of effective, evidence-based therapies for psychogenic voice disorders.
- Research Article
3
- 10.3233/shti251250
- Aug 7, 2025
- Studies in health technology and informatics
- Sina Rashidi + 4 more
Speech biomarkers are critical for diagnosing conditions like Alzheimer's disease and related dementias (ADRD), but data scarcity hinders progress. We present SpeechCura, an open-source framework for speech data augmentation using non-generative methods (e.g., frequency masking, time masking) and generative methods (e.g., Text-to-Speech, voice conversion). Evaluated on the DementiaBank dataset, SpeechCura improved the performance of ADRD detection pipeline, TransformerCARE, by 7.4% in F1-score and 3.8% in AUC-ROC with voice conversion methods. These results highlight the potential of SpeechCura in addressing data scarcity challenges and its adaptability for other healthcare applications, ultimately enhancing diagnostic performance and patient outcomes.