Articles published on prosodic-features
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
1918 Search results
Sort by Recency
- Research Article
- 10.61173/dt4ezq43
- Oct 29, 2024
- Arts, Culture and Language
- Chuanqi Sun + 1 more
English, as the most widely used language globally, is increasingly valued by parents and educators. In China, a growing number of children choose English as their second language for learning after being able to use Mandarin relatively proficiently. Mandarin speakers, learning English may face many challenges, one of which is that native Mandarin speakers find it difficult to speak English naturally like native English speakers. Some scholars believe that the disparity in prosodic features between Mandarin and English is one of the key factors causing this difficulty. Compared to adult learners, children have an unstable prosodic development system, resulting in significant differences in language acquisition. Therefore, understanding the relationship between the prosodic features of Mandarin-speaking children and their English production is crucial. From the perspective of structure and purpose, this paper Outlines a systematic literature review framework, hoping to summarize the previous articles on the role of prosody in the English production of Mandarin speaking children, and provide some useful results that can help people in this field to improve their work. From the perspective of content, in this systematic literature review, we are going to propose and solve two questions: (1) Does the prosody of Mandarin influence Mandarin-speaking children’s English production? (2) If the influences do exist, what are the specific influences? To solve these two questions, we first search some articles with the searching strategies that we design. Then we plan to screen the articles that we get from searching with the inclusion criteria and the exclusion criteria that we design. Finally, we plan to use quantitative analysis to analyze these samples and give the eventual result.
- Research Article
2
- 10.55003/cast.2024.257184
- Oct 16, 2024
- CURRENT APPLIED SCIENCE AND TECHNOLOGY
- Smitha Narendra Pai + 2 more
Emotions play a key role in determining the human mental state and indirectly express an individual’s well- being. A speech emotion recognition system can extract a person’s emotions from his/her speech inputs. There are some universal emotions such as anger, disgust, fear, happiness, pleasantness, sadness and neutral. These emotions are of significance especially in a situation like the Covid pandemic, when the aged or sick are vulnerable to depression. In the current paper, we examined various classification models with finite computational strength and resources in order to determine the emotion of a person from his/her speech. Speech prosodic features like pitch, loudness, and tone of speech, and work spectral features such as Mel Frequency Capstral Coefficients (MFCCs) of the voice were used to analyze the emotions of a person. Although sequence to sequence state of the art models for speech detection that offer high levels of accuracy and precision are currently in use, the computational needs of such approaches are high and inefficient. Therefore, in this work, we emphasised analysis and comparison of different classification algorithms such as multi layer perceptron, decision tree, support vector machine, and deep neural networks such as convolutional neural network and long short term memory. Given an audio file, the emotions that were exhibited by the speaker were recognized using machine learning and deep learning techniques. A comparative study was performed to identify the most appropriate algorithms that could be used to recognize emotions. Based on the experiment results, the MLP classifier and convolutional neural network model offered better accuracy with smaller variations when compared with other models used for the study.
- Research Article
- 10.31261/tapsla.14601
- Oct 15, 2024
- Theory and Practice of Second Language Acquisition
- Łukasz Matusz
Modern coursebooks serve a fundamental function in contemporary ELT practice. This paper discusses the problem of representation of anger in listening activities from selected ELT coursebooks issued by leading publishing companies. Thirteen coursebooks form three internationally-recognized ELT series for adult learners of English were analysed for the conflictive dialogues presented in their audio materials, as well as for the ways in which the anger of the Speaker(s) was expressed. The result of the analysis shows that interpersonal exchanges portrayed in the database coursebooks were largely oriented towards the expression of polite interpersonal beliefs, the culture of positivity and attitude of agreement and cooperation. In situations where conflict was presented in the recordings, anger was expressed primarily through prosodic features of speech, followed by the presence of exclamations and certain non-verbal vocalisations. No instances of swearing and expletive interjections, a common way of expressing negative emotions in everyday informal communication, were found in the database. The analysis confirms some of the observations and criticisms concerning the global ELT coursebooks. While understanding publishers’ caution and decidedly refraining from advocating unrestricted use of taboo language in recorded ELT materials, this paper points to the importance of realistic representation of conflictive and argumentative interpersonal communication, not just for the aim of presenting different contexts of English use, but also for the practical applications outside the realm of foreign language learning.
- Research Article
- 10.35520/diadorim.2023.v25n3a62803
- Oct 15, 2024
- Revista Diadorim
- Lucas De Souza
This paper aims to analyze how constructional variation and constructional stabilization occurwith [sei] and [aham sei] as disbelief answers in Brazilian Portuguese (BP) instead of theircommon use linked to cognitive sense (the act of knowing something/knowing about somethingor knowing how to do something), perceived primarily by differences in intonation andconversational context. For this, we’ll see the results found by Souza (2024) who, based oncorpora data and tests made with native speakers, seek to understand a little further howconstructional variation happens in this case and what are the preponderant factors for thesespeakers to choose those constructions instead of others or vice-versa in some specificdialogical scenarios. His work, elaborated under a socioconstructionist profile (MachadoVieira, 2016; Machado Vieira and Wiedemer, 2018; 2019a; 2019b) studied how ConstructionGrammar (Goldberg 1995, 2006; Traugott and Trousdale, 2013) and VariationistSociolinguistic (Weinreich, Labov and Herzog 1968; Eckert, 2012) can contribute to explainempirically the results found, which led us to follow the same theoretical path in order to expandthe examples and expose interesting data shown at his work, specially how schematicity andproductivity (Traugott and Trousdale, 2013) happen within these constructions. In sum, wepresent the most prominent results of a work that studies how prosodic features interfere in theform-meaning paring in the analyzed data, an interface still underexplored at the constructionalstudies field.
- Research Article
5
- 10.1037/xlm0001355
- Oct 1, 2024
- Journal of experimental psychology. Learning, memory, and cognition
- Mara Breen + 3 more
Young children's prosodic fluency correlates with their reading ability, as children who are better early readers also produce more adult-like prosodic cues to syntactic and semantic structure. But less work has explored this question for high school readers, who are more proficient readers, but still exhibit wide variability in reading comprehension skill and prosodic fluency. In the current study, we investigated acoustic indices of prosodic production in high school students (N = 40; ages 13-19) exhibiting a range of reading comprehension skill. Participants read aloud a series of 12 short stories which included simple statements, wh-questions, yes-no questions, quotatives, and ambiguous and unambiguous multiclausal sentences. In addition, to assess the contribution of discourse coherence, sentences were read in either canonical or randomized order. Acoustic cues known to index prosodic phenomena-duration, fundamental frequency, and intensity-were extracted and compared across structures and participants. Results demonstrated that high school readers as a group consistently signal syntactic and semantic structure with prosody, and that reading comprehension skill, above and beyond lower-level skills, correlates with prosodic fluency, as better comprehenders produced stronger prosodic cues. However, discourse coherence did not produce consistent effects. These results strengthen the finding that prosodic fluency and reading comprehension are linked, even for older, proficient readers. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
- Research Article
3
- 10.1016/j.jvoice.2024.09.012
- Oct 1, 2024
- Journal of Voice
- Zuyu Du + 4 more
Estimation of Speech Features Using a Wearable Inertial Sensor
- Research Article
1
- 10.1121/10.0035084
- Oct 1, 2024
- The Journal of the Acoustical Society of America
- Maryam Karimi Boroujeni + 4 more
Background: The Speech-evoked Frequency Following Response (sFFR) provides spctro-temporal data on speech processing in the auditory system. Its effectiveness in extracting prosodic features like variations in fundamental frequency (F0 contour) and intensity is uncertain. Objectives: This study examines how well sFFR tracks F0 contour in different emotions using a natural two-syllable word. It also explores talker’s gender impact on F0 contours and gender disparity in encoding prosodic cues. Method: The word “balloon” spoken by male and female speakers with sad and happy emotions, elicited FFR from 16 adults (8 males, aged 18–31). A pitch estimation algorithm calculated root mean squared error and 5% accuracy to evaluate the response’s fidelity to F0 contour under different conditions. Results: The sFFR tracked prosodic speech features, influenced by emotion type and talker voice characteristics. Participants identified emotions most accurately from sad male voices. Lower F0 trajectories corresponded to more reliable FFR responses, showing better tracking of male voices and sad emotions. No significant gender-related differences were observed in emotional data processing. Conclusion: These findings highlight sFFR’s utility in capturing dynamic speech properties and its potential in clinical assessments. Future research should explore prosody processing in hearing-impaired individuals and consider integrating sFFR into diagnostic protocols.
- Research Article
- 10.3791/66313
- Sep 27, 2024
- Journal of visualized experiments : JoVE
- Leônidas Silva + 1 more
This research aims to examine both the prosodic-acoustic features and the perceptual correlates of foreign-accented English and foreign-accented Brazilian Portuguese and check how the speakers' productions of foreign and native accents are correlated to the listeners' perception. In the Methodology, we conducted a speech production procedure with a group of American speakers of L2 Brazilian Portuguese and a group of Brazilian speakers of L2 English, and a speech perception procedure in which we performed voice lineups for both languages.For the speech production statistical analysis, we ran Generalized Additive Models to evaluate the effect of the language groups on each class (metric or prosodic-acoustic) of features controlled for the smoothing effect of the covariate(s) of the opposite class. For the speech perception statistical analysis, we ran a Kruskal-Wallis test and a post-hoc Dunn's test to evaluate the effect of the voices of the lineups on the scores judged by the listeners. We nevertheless conducted acoustic (voice) similarity tests based on Cosine and Euclidean distances. Results showed significant acoustic differences between the language groups in terms of variability of the f0, duration, and voice quality. For the lineups, the results indicated that prosodic features of f0, intensity, and voice quality correlated to the listeners' perceived judgments.
- Research Article
- 10.62441/nano-ntp.vi.2293
- Sep 26, 2024
- Nanotechnology Perceptions
- Akbar Ali + 1 more
The objective of this study is to attempt to build an emotion recognition system through speech samples using deep learning techniques. Emotions are fundamental human trait, serving as a means of expressing thoughts and communicating intention. Emotion Recognition systems analyse audio signals to extract and predict the emotional state of a speaker. Emotions are generally classified as Anger, Happiness, Sadness, and Neutral. These systems rely on spectral and prosodic features to detect emotions. Mel-frequency Cepstral Coefficients (MFCC) are a significant spectral attribute, while prosodic attributes include frequency, loudness, and pitch. The frequency of an audio broadcast can be used to distinguish between various sounds and ascertain the gender of the speaker. The study shows that when Support Vector Machines (SVM) are used in Emotion Recognition to categorise and predict tasks, especially in identifying the speaker's gender. Emotions are identified utilising certain attributes through the utilisation of additional machine learning models such as Radial-Basis Function (RBF) and Back Propagation networks. The proposed model shows an accuracy of 72% reflecting reliability on CNN modelling.
- Research Article
- 10.17507/tpls.1409.05
- Sep 12, 2024
- Theory and Practice in Language Studies
- Bui Nguyen Nguyet Minh + 2 more
In teaching pronunciation, the traditional articulatory approach, commonly used in Vietnamese classrooms, has shown limitations in addressing the phonetic challenges posed by the differences between Vietnamese and English consonant systems. This study investigates the use of an alternative approach, the Simplified Verbotonal Approach (SVA), in improving the pronunciation of voiceless consonants among Vietnamese EFL undergraduates. The SVA, which emphasizes prosodic features through intensive practice with lowpass filtered speech, was hypothesized to aid learners in producing more accurate voiceless consonants. A mixed-methods quasi-experimental design was employed, involving 70 first-year non-English major students. The control group received instruction using standard pronunciation textbooks, while the experimental group utilized an online platform incorporating SVA principles. Pre- and post-tests assessed participants' pronunciation of voiceless consonants in isolation, sentences, and passages. Semi-structured interviews provided qualitative insights into learners' opinions of the SVA. Quantitative results demonstrated significant improvements in the experimental group's pronunciation accuracy, particularly in sentences and passages. Qualitative data revealed positive student feedback on the SVA. These findings suggest that integrating prosodic training through the SVA can significantly enhance the pronunciation of voiceless consonants in Vietnamese learners, offering a viable alternative to traditional articulatory methods in EFL contexts.
- Research Article
1
- 10.1017/s0332586524000052
- Sep 9, 2024
- Nordic Journal of Linguistics
- Ditte Zachariassen
Abstract This article presents structural and interactional aspects of Strong Finals, a prosodic feature characterised by lengthening, increased volume, and non-falling intonation on word-final syllables. Interactionally, Strong Finals support five types of action: listing, projecting a description, stating conditions, asking questions, and announcing reported speech. In general, Strong Finals project that there is more to come, and this ‘more’ may in some cases be provided by either participant. Strong Finals are often found in multi-speaker settings, where they assist speakers in taking the floor or changing the topic. The article’s descriptions are based on recordings of natural spoken interaction in linguistically diverse areas in Aarhus, Denmark. Here, a new urban dialect has developed like other urban dialects that have been described in Copenhagen and other North Germanic cities. Strong Finals are a local phenomenon, however, and are not found in the Copenhagen studies.
- Research Article
4
- 10.1080/23311983.2024.2391646
- Sep 4, 2024
- Cogent Arts & Humanities
- Ehab Saleh Alnuzaili + 5 more
The current study aims to establish that emojis are graphic equivalents for prosodic features in natural speech. The rapid emergence of artificial intelligence (AI) in the field of communication urged to investigate emojis, used as a strategy to achieve visual prosody in computer-mediated communication (CMC). The study adopts mixed- methodology and computer-mediated-discourse analysis (CMDA) that focuses on manipulation of grammatical rules to create typographic impression to achieve prosody. Data is collected from 300 WhatsApp and Facebook users using an instrument—questionnaire. Data has been analyzed in two ways: descriptive analysis of chats and quantitative analysis getting frequencies of the emojis. The study arguably provides: (i) syntactic variants of the use of emojis: emoji aspect, ellipsis, embedding, embedded, bare, partial bare, beside words, repetition of emoji and repetition of emoji encoded utterance, (ii) variant manifestations of emojis displaying emotions at scalar level, and (iii) basic emotions: anger, joy, sarcasm, fear and neutral displayed via emojis’ varying manifestations; laughter, pouting face, non-emotion emojis representing objects and human faces. The overall findings predict that the observed construction is encoded with different placements of emojis and emojis are displaying diverse manifestations to exhibit prosodic features in CMC such as intonation, rhythm, duration, pause and stress. These prosodic features of emojis manifest aggression, love, hate, sadness and joy. The study hopes to augment the knowledge in the field of CMC and motivates the future researchers to conduct further studies in graphic switching and social implementation of CMC.
- Research Article
2
- 10.1016/j.dcm.2024.100819
- Aug 30, 2024
- Discourse, Context & Media
- Christian Ilbury
Contemporary research has shown that a combination of qualitative and quantitative methods is productive in exploring patterns of Digitally Mediated Communication (DMC). In this paper, I demonstrate the analytical potential of this approach by studying the typographic representation of a prosodic feature of spoken language – High Rising Terminals (HRTs, e.g., that beer pong place I went for my birthday?) – in a large corpus of WhatsApp messages (96,471 messages; 594,183 words) sent by 15 young British adults. Combining methods and approaches from variationist and interactional sociolinguistics, I show that the orthographic representation of HRTs patterns in pragmatically similar ways to the feature in speech in that it most frequently functions as a way of verifying the interlocutors’ comprehension of discourse-new information. The precise rate and pragmatic function of this feature, however, appears to be constrained by the textual modality of the platform. Concluding, I join others in arguing for the analytical potential of employing a multidimensional approach to studying variable patterns of DMC.
- Research Article
1
- 10.34069/ai/2024.80.08.10
- Aug 30, 2024
- Revista Amazonia Investiga
- Nataliia Shkvorchenko + 4 more
Research on prosodic influences on syntactic structures is particularly relevant in the context of modern media, where the quality and impact of speech are crucial. Contemporary news discourse is characterized by high competition for the audience's attention, so understanding the role of prosodic elements can aid in creating more engaging and effective content. With the rapid development of technology and changing ways of consuming information, researching such aspects of speech contributes to adapting news formats to the needs of the modern listener. The aim of this study is to analyze how intonation, rhythm, pauses, and stress can alter or emphasize syntactic constructions in speech, specifically in news discourse. The research methodology includes methods such as analysis, psycholinguistic methods, and content analysis. The study concludes that in news discourse, prosodic elements are essential for conveying information in a way that is understandable, attention-grabbing, and memorable for listeners. For example, a pause before important information or a rise in pitch to emphasize certain news can significantly affect audience perception. Prosody, which includes intonation, rhythm, and stress, plays a crucial role in shaping the perception and understanding of information by listeners. Specific examples from news programs are analyzed, where prosodic characteristics change or enhance the meaning of individual syntactic constructions. It has been established that the correct use of prosody can improve the communicative effectiveness and impact of news content. Cases are also examined where prosodic elements contribute to creating certain emotional reactions in listeners, thereby influencing their opinions about the presented information.
- Research Article
- 10.1142/s0218001424500174
- Aug 24, 2024
- International Journal of Pattern Recognition and Artificial Intelligence
- V V Satyanarayana Tallapragada + 3 more
Emotion recognition is an acceptable task of understanding the other’s emotions and thoughts. Modern technology allows machines to recognize objects without the need for human intervention. The existing emotion recognition system faces more difficulties in making an accurate result with limited audio files. To address this problem, a Bag of audio terms-based hybrid deep learning models will be introduced it is known as the pioneering deep learning model. Input voice data is considered from a large dataset and pre-processed using a Data normalization and adaptive bilinear filtering approach. Afterward, acoustic features are taken out from the voice signals to capture related information for emotion recognition. These features can include linear prediction coefficients (LPC), three-dimensional (3D) log-mel spectrum, mel-frequency cepstral coefficients (MFCCs), and Prosodic features. Subsequently, feature selection is performed using an improved wild horse optimization (WHO) approach. Finally, a hybrid capsule slime mould dense deep learning framework (HCSDN) is used for voice-based emotion recognition. IEMOCAP and EMODB datasets are used to calculate system performance. The performance metrics denote the proposed system achieves 96.78% accuracy, 96.45% specificity, 95.81% precision, 4.256% error rate, and 94.256% sensitivity, 0.75% false positive rate in terms of the IEMOCAP dataset. Similarly, the proposed system achieves 96.85% accuracy, 95.74% specificity, 96.12% precision, 3.432% error rate, 95.25% sensitivity, and 0.62% false positive rate in terms of the EMODB dataset.
- Research Article
1
- 10.4312/elope.21.1.63-88
- Aug 22, 2024
- ELOPE: English Language Overseas Perspectives and Enquiries
- Alexey Tymbay
A comparative perceptual study involving two experimental groups with different native languages (Russian and Czech) shows that phonologically trained non-native speakers of English are good at identifying basic suprasegmental features of the English language, namely prominence (sentence stress) and accent types, which potentially makes it possible to use their prosodic annotations when validating cross-language intonation research. The occasional failure of both experimental groups to identify certain accent types is explained in the study by the annotators’ mother tongue’s prosodic interference: Czech and Russian speakers rely on different acoustic cues when identifying prosodic features in their native languages and transfer this habit to the discrimination of English prosodic characteristics. The study demonstrates that when a prosodic cue is not marked in the speaker’s mother tongue, it will likely be ignored in the foreign language.
- Research Article
- 10.14434/emt.no.16.39308
- Aug 20, 2024
- Ethnomusicology Translations
- Mahagama Sekera + 2 more
Mahagama Sekera (1929–1976) investigates how rhythmic qualities and poetics inhere in figurative language found in everyday Sinhala speech; the formation of original constructions in poetry and song lyrics; prosodic features in child storytelling; colloquial expressions generated from one light verb; aperiodic rhythms in context-specific utterances and proverbs; and the moraic structures of doublets (yugala pada). Citation: Sekera, Mahagama. “Rhythmic and Poetic Qualities in Sinhala Speech.” Translated by Garrett Field and Ravinda Mahagamasekera; edited by Richard K. Wolf. Ethnomusicology Translations, no. 16. Bloomington, IN: Society for Ethnomusicology, 2024. DOI: Originally published in Sinhala as pages 39–51 in chapter 2, “Rhythm and [Sinhala] Speech” in Sinhala Gadya Padya Nirmānayanhi Ridma Lakṣana (Rhythmic Qualities of Sinhala Prose and Verse), S. Godage & Brothers, 2007 [2001].
- Research Article
1
- 10.1163/19589514-53020006
- Aug 19, 2024
- Faits de Langues
- Karine Martel + 4 more
Abstract Speech planning allows the speaker to highlight new information in order to draw the interlocutor’s attention. This focalization process has been described in spoken and sign languages, but rarely in a natural context such as family interaction. In this article, we study focalization at its prosodic level, by comparing the phenomena and parameters mobilized in the two types of language, within a privileged context, in order to observe focus marking during parent-children interaction: the family dinner. We make an inventory of prosodic focalization markers, mobilized in both languages and both modalities (spoken and gestural), from a sample of videorecorded meals in LSF and spoken French (ANR DinLang). We observe that prosodic features of focalization can be established in inter-modal terms and that focus is marked by the combination of various phenomena, which appear in contrast with the prosodic context. We give the example of a pattern that seems interesting to us to examine in the context of focalization, the one of scansion, and we emphasize the interest of considering sign languages and ‘spoken’ languages as embodied languages.
- Research Article
- 10.21203/rs.3.rs-4745684/v1
- Aug 17, 2024
- Research Square
- Gianna Kuhles + 6 more
Machine learning analyses are widely used for predicting cognitive abilities, yet there are pitfalls that need to be considered during their implementation and interpretation of the results. Hence, the present study aimed at drawing attention to the risks of erroneous conclusions incurred by confounding variables illustrated by a case example predicting executive function performance by prosodic features. Healthy participants (n = 231) performed speech tasks and EF tests. From 264 prosodic features, we predicted EF performance using 66 variables, controlling for confounding effects of age, sex, and education. A reasonable model fit was apparently achieved for EF variables of the Trail Making Test. However, in-depth analyses revealed indications of confound leakage, leading to inflated prediction accuracies, due to a strong relationship between confounds and targets. These findings highlight the need to control confounding variables in ML pipelines and caution against potential pitfalls in ML predictions.
- Research Article
7
- 10.1002/jdn.10366
- Aug 12, 2024
- International journal of developmental neuroscience : the official journal of the International Society for Developmental Neuroscience
- Gabriela Cintra Januário + 4 more
Functional near-infrared spectroscopy and language development: An integrative review.