• All Solutions All Solutions Caret
    • Editage

      One platform for all researcher needs

    • Paperpal

      AI-powered academic writing assistant

    • R Discovery

      Your #1 AI companion for literature search

    • Mind the Graph

      AI tool for graphics, illustrations, and artwork

    • Journal finder

      AI-powered journal recommender

    Unlock unlimited use of all AI tools with the Editage Plus membership.

    Explore Editage Plus
  • Support All Solutions Support
    discovery@researcher.life
Discovery Logo
Sign In
Paper
Search Paper
Cancel
Pricing Sign In
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Chat PDF iconChat PDF Star Left icon
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link
Discovery Logo menuClose menu
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Chat PDF iconChat PDF Star Left icon
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link

Related Topics

  • Interactive Video
  • Interactive Video
  • Interactive TV
  • Interactive TV

Articles published on Interactive audio

Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
144 Search results
Sort by
Recency
  • New
  • Research Article
  • 10.36922/ac025180032
Interactive audio toolkit: Creating sonic experiences and installations with low-cost, low-power microcontrollers
  • Nov 20, 2025
  • Arts & Communication
  • Aman Jagwani + 1 more

Interactive sound installations and experiences are increasingly prevalent in museums, galleries, events, and various spaces worldwide. These installations are often powered by microcontroller board running programs that control interactivity and possibly even audio processing. Commonly used microcontrollers such as the ESP32 and Raspberry Pi Pico are generic, necessitating custom implementations for audio-specific functionalities. Existing audio libraries often have limited signal processing features and may not be designed with interactivity in mind. This paper presents the Interactive Audio Toolkit, which focuses on the seamless integration of sound generation with sensors to create interactive sound experiences with low-power, low-cost microcontrollers. This C++ toolkit is structured around sensor input and audio output classes, providing a simple interface to generate flexible audio responses from sensor interactions. This paper details the toolkit’s structure, components, and workflow, highlighting its ability to foster new forms of sonic and musical interactions for artists and audiences. A case study of an installation utilizing the toolkit is also presented.

  • Research Article
  • 10.47772/ijriss.2025.925ileiid00005
“Phonetics to the Rescue”: A Gamified Approach to Learning Phonetics
  • Nov 4, 2025
  • International Journal of Research and Innovation in Social Science
  • Nur Haika Binti Rosle + 3 more

“Phonetics to the Rescue” is a game-changer for mastering one of linguistics’ toughest subjects. Phonetics can intimidate even the most motivated students as its intricate sound system, unfamiliar symbols, and scarce quality resources often make it feel like a maze. Yet, mastering the International Phonetic Alphabet (IPA) is vital for accurate pronunciation and confident communication. This web-based educational game turns that challenge into an adventure. Players step into the shoes of a daring protagonist on a mission to escape, but every door forward is locked by a phonetics puzzle. From matching IPA symbols to recognising tricky sounds and transcribing speech, each challenge blends interactive audio recognition with fast-paced problem-solving. The innovation lies in its fusion of rigorous phonetics practice with immersive, narrative-driven gameplay. By making learning feel like play, it breaks down anxiety, sustains engagement, and makes even the most complex concepts approachable. “Phonetics to the Rescue” is flexible for both classroom and independent use, adapting to different skill levels while keeping learners hooked. The outcome is more than just improved grades, as students gain sharper listening skills, better pronunciation, and lasting phonetic confidence. Fun, focused, and future-ready, this invention makes phonetics a subject students want to master.

  • Research Article
  • 10.5194/isprs-archives-xlviii-g-2025-1741-2025
Urban Traffic Noise Analysis with The Integration of Vision-Language Model and 3DWebGIS
  • Aug 2, 2025
  • The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
  • Yueying Zhang + 3 more

Abstract. In the context of rapid urbanization, traffic noise pollution has emerged as a critical environmental issue. This study proposes an innovative three-dimensional dynamic noise mapping method addressing existing technical challenges in noise modeling and visualization. The key innovations include: Utilizing the Grounding DINO large-scale vision-language model to automatically extract traffic flow information from video surveillance data, significantly improving data acquisition efficiency and accuracy. Developing a Web-based three-dimensional visualization system using the Cesium platform, supporting interactive dynamic noise distribution display and innovatively introducing an audio feedback mechanism. The research method combines deep learning with spatiotemporal correlation analysis to effectively capture noise source parameters. The noise model adopts the CNOSSOS-EU standard, considering multiple factors including geometric divergence attenuation, atmospheric absorption, ground effects, and building reflection and diffraction. Using Jingxiu and Lianchi Districts in Baoding City, Hebei Province as a case study, the research validates the method’s effectiveness. The three-dimensional visualization results demonstrate the approach’s superior ability to reflect the physical characteristics of real-world acoustic environments, providing crucial technical support for urban planning and noise control decision-making. Key innovations include improved noise distribution accuracy, dynamic visualization capabilities, and the introduction of interactive audio feedback, offering a novel technical approach to urban noise assessment.

  • Research Article
  • 10.18357/kula.288
Revisiting Ranganathan
  • Jul 15, 2025
  • KULA: Knowledge Creation, Dissemination, and Preservation Studies
  • A.M Alpin + 1 more

Rule N°5: The Library Is a Growing Organism is a collaboratively created installation located in NYU’s main library that centers the voices of library workers in six sculptures. The work takes its name from Ranganathan’s five laws or “rules” of library science published in 1931 and focuses on how the spirit of the fifth rule or law, which reads “the library is a growing organism,” continues to inform the life and work of the library. As an interactive audio experience, Rule N°5 invites listeners to open doors and drawers, plug in, and push buttons to explore what it means to collect the world’s knowledge, preserve the past, and shape the future. Rule N°5 examines practices and objects that shape how we can search, who we will find, and what we remember. In this article, the makers behind the installation share stories of the project’s creation and a deeper inspection of the invisible labor and conversations about questions of power, authority, and the politics of seemingly innocuous library work that inform the work.

  • Research Article
  • 10.5070/l2.21148
Understanding L2 Online Peer Tutoring Participation Through Response Formulations To Advice: A Case Study
  • Jun 2, 2025
  • L2 Journal
  • Mei-Hsing Tsai

Applying the single case analysis methodology, this study investigates how a peer tutee demonstrated his understanding of his peer tutor’s advice by using a particular type of formulation, response formulations to advice, to negotiate the pair’s participation framework in second language (L2) online peer tutorials. Although the tutor initially held the floor more often during the verbal interaction in the online tutorial, the tutee was able to move from peripheral to fuller participation by offering response formulations to advice. Two types of response formulations to advice were identified: (1) expressing the gist of the tutor’s advice to indicate direct comprehension, and (2) constructing an upshot to add a new viewpoint to the previous utterance by conveying unstated information (e.g., offering an account of what was not discussed in previous discourse). The tutee’s response formulations to advice provided the tutor with an opportunity to access the tutee’s L2 knowledge and thereby offer more mediation to enhance the tutee’s metalinguistic knowledge of the target language. The tutee’s upshot formulation in particular also influenced the type of advice he received. Overall, this study highlights how peer tutees’ use of response formulations to advice, enhanced by the affordances of online mediation such as real-time audio interaction, facilitates greater conversational participation and deeper engagement, which leads to more meaningful, cooperative, and effective tutorials.

  • Research Article
  • 10.29039/2949-1258/2024-4/060-072
Интерактивный аудиогид как инструмент экскурсионной деятельности в контексте развития Smart-туризма
  • Feb 18, 2025
  • THE TERRITORY OF NEW OPPORTUNITIES OPENS FOR INVESTMENT PROJECTS OF THE FUTURE
  • Galina Gomilevskaya + 1 more

The article examines the innovative nature of sightseeing activities, taking into account the use of an interactive audio guide in the context of the development of smart tourism. Sightseeing and educational tourism is the most popular segment of the tourist market, which, in turn, demonstrates gradual improvement with the advent of new forms and approaches to its organization. In recent years, the integration of artificial intelligence has become relevant. Advanced technologies are revolutionizing various industries due to the advantages they can offer. According to the form of a guided tour, audio guided tours are currently gaining significant popularity. This is justified by the fact that the audio guide allows you to get comprehensive information about the place you visit. It enables sightseeing viewers to decide how much time to devote to a particular object without being depended on the guide. Mobility and accessibility are the main advantages of an audio guide over a live story. The aim of the work is to develop the technological foundations of an interactive guided walking audio guide using artificial intelligence. The results of the work comprise the analysis of world and domestic experience; the development of a classification system for interactive excursion services; the analysis of ways to introduce innovative technologies in tourism; the presentation of the structure of an interactive guided walking audio guide. The scientific novelty of the research is the formation of the technological foundations of an innovative audio guide and a classification system for interactive excursion services. The practical significance lies in evaluating the ways of implementing a neural network and a chat-bot in the development of an excursion service.

  • Research Article
  • 10.37304/ebony.v5i1.18015
Investigating the Use of Mobile-Assisted Language Learning (MALL) for Listening Instructions in SMAN 5 Tanjungpinang
  • Jan 15, 2025
  • EBONY: Journal of English Language Teaching, Linguistics, and Literature
  • Zidan Dwi Khalfani Kareem + 5 more

Listening is essential for effective communication but often poses challenges in traditional teaching settings, where exposure to authentic materials is limited (Vandergrift & Goh, 2012). However, Mobile-Assisted Language Learning (MALL) provides mobile access to interactive audio content, enabling adaptive learning experiences (Burston, 2014) and continuous exposure to varied accents (Godwin-Jones, 2017). This study explores the implementation of MALL for listening instructions in SMAN 5 Tanjungpinang, focusing on the research questions: “Has SMAN 5 Tanjungpinang incorporated MALL into their teaching practices?” and “How does MALL contribute to improving SMAN 5 Tanjungpinang students' listening skills?” This qualitative research employs observation and semi-structured interviews to examine MALL's application in real classroom settings, aligning with the (Creswell, 2014) emphasis on context and utilizing thematic analysis to identify emerging patterns (Braun & Clarke, 2013). Findings reveal that while SMAN 5 Tanjungpinang has started incorporating MALL, there is significant potential to enhance its integration further and positively impact students' listening skills. These results highlight the importance of integrating mobile technology into educational curricula to enhance language learning outcomes.

  • Open Access Icon
  • Research Article
  • Cite Count Icon 1
  • 10.3389/frobt.2024.1356477
Building for speech: designing the next-generation of social robots for audio interaction.
  • Jan 3, 2025
  • Frontiers in robotics and AI
  • Angus Addlesee + 1 more

There have been significant advances in robotics, conversational AI, and spoken dialogue systems (SDSs) over the past few years, but we still do not find social robots in public spaces such as train stations, shopping malls, or hospital waiting rooms. In this paper, we argue that early-stage collaboration between robot designers and SDS researchers is crucial for creating social robots that can legitimately be used in real-world environments. We draw from our experiences running experiments with social robots, and the surrounding literature, to highlight recurring issues. Robots need better speakers, a greater number of high-quality microphones, quieter motors, and quieter fans to enable human-robot spoken interaction in the wild. If a robot was designed to meet these requirements, researchers could create SDSs that are more accessible, and able to handle multi-party conversations in populated environments. Robust robot joints are also needed to limit potential harm to older adults and other more vulnerable groups. We suggest practical steps towards future real-world deployments of conversational AI systems for human-robot interaction.

  • Research Article
  • 10.63216/alulum.v2i02.359
PENGEMBANGAN MEDIA BERBASIS WEB UNTUK MENINGKATKAN PEMBELAJARAN AGAMA ISLAM SD
  • Dec 28, 2024
  • AL-ULUM | JURNAL PENDIDIKAN DAN PENGAJARAN
  • Muhammad Imam Mulyana + 3 more

Abstract: This study aims to develop a web-based application as a learning medium for Islamic Religious Education (PAI) for elementary school students to address challenges in conventional teaching methods that lack engagement. The research employed the Research and Development (R&D) method with the ADDIE development model, encompassing the stages of analysis, design, development, implementation, and evaluation. The subjects were sixth-grade students from elementary schools in Banjarbaru City, with data collection conducted through interviews, observations, and literature reviews. Evaluation was carried out using User Acceptance Testing (UAT) and pretest-posttest methods to assess the application’s effectiveness. The findings reveal that the application effectively enhances students’ motivation and learning outcomes, with an average score increase of 10 points. Statistical tests indicated a significant difference between conventional and application-based learning (p < 0.10). The application features interactive audio, animations, and visuals, making the learning process more engaging. UAT results demonstrated that the application meets quality and user comfort standards. This study concludes that integrating technology into PAI learning offers an innovative solution for improving students' understanding of religious values. Further development is recommended, including the addition of interactive features, broader trials, and integration into mobile platforms to enhance application accessibility.

  • Research Article
  • 10.55606/jig.v3i1.3401
Meningkatkan Pengetahuan Konsumsi Serat Dikalangan Mahasiswa Menggunakan Media Audio dan Visual
  • Dec 16, 2024
  • Jurnal Ilmu Kesehatan dan Gizi
  • Elzalika Aisyiyah Agsya + 5 more

Low fiber consumption among university students increases the risk of chronic diseases such as coronary heart disease, and diabetes. Many students prioritize convenient eating habits and are influenced by peer groups, often overlooking the importance of fiber intake in their daily diet. This educational program aims to enhance students' knowledge about the benefits of fiber consumption through interactive audio and visual media. The program applies the Health Belief Model (HBM) theory to design educational content in the form of videos and posters distributed via the social media platform Instagram. Involving 26 student respondents from various faculties at Halu Oleo University, the results showed a significant improvement in students' understanding of the health benefits of fiber, types of fiber-rich foods, and the role of fiber in preventing chronic diseases. Education through engaging media successfully increased students' motivation to improve their dietary habits by incorporating more fiber into their diets.

  • Research Article
  • 10.54254/3049-5458/2024.18639
FASSLING: Transforming emotional and coaching support through artificial intelligence (AI) innovation
  • Dec 16, 2024
  • Journal of Clinical Technology and Theory
  • Yujia Zhu

The global mental health crisis is compounded by barriers such as cost, accessibility, and stigma, leaving millions without adequate support. FASSLING (fassling.ai), an innovative artificial intelligence (AI)-powered platform, addresses these challenges by providing free, 24/7 multilingual emotional and coaching support through text and audio interactions. Grounded in inclusivity and compassion, FASSLING bridges gaps in traditional mental health systems by offering immediate, non-clinical support while complementing professional services. This paper explores FASSLING's design and implementation, emphasizing its user-centered features, including cultural adaptability, trauma-informed care principles, and active listening techniques. The platform not only empowers users to navigate emotional challenges but also fosters resilience and empathy, creating a ripple effect of societal compassion. Ethical considerations, such as ensuring user privacy and managing the limitations of AI, are central to FASSLINGs mission. By integrating advanced AI technologies with psychological best practices, FASSLING sets a new standard for accessible and inclusive mental health support, positioning itself as a transformative tool for global well-being. This case study highlights FASSLING's potential to redefine emotional support systems and drive positive change in mental health care worldwide.

  • Research Article
  • Cite Count Icon 1
  • 10.26034/cm.jostrans.2024.5980
Professional and novice audio describers: quality assessments and audio interactions
  • Jul 29, 2024
  • The Journal of Specialised Translation
  • Sawako Nakajima + 1 more

Empowering novice describers can reduce costs and expand access to high-quality audio descriptions (ADs). This study explored differences between novice and professional practices by analysing their ADs for a 3:42-minute scene from a Japanese fictional film. A film producer rated both the overall quality and volume quality of ADs. The perceived AD volume quality reflects the comprehensive volume experience within ADs beyond loudness. The assessment revealed that ADs created by ten novices using speech synthesis reached approximately 60% of both the overall quality and volume quality of published ADs with human voice. Kernel density estimation showed significantly lower mean loudness in published ADs than in novice ADs. Additionally, a significant negative correlation existed between perceived AD volume quality and mean film loudness during AD presentation across all AD sets. However, published ADs had longer durations compared to novice ADs. Contrasting cueing strategies were observed. Published ADs relied on film sounds, whereas novice ADs leaned on visual cues. Consequently, we developed a professional technique: carefully curating the film information to be heard and balancing AD placement to ensure the audio experience of both ADs and film sound without abrupt AD loudness increases. This sonic approach empowers novices to craft impactful ADs.

  • Open Access Icon
  • PDF Download Icon
  • Research Article
  • Cite Count Icon 1
  • 10.47476/jat.v7i1.2024.267
Co-design of a Voice-Driven Interactive Smart Guide for Museum Accessibility and Management
  • Jun 17, 2024
  • Journal of Audiovisual Translation
  • Xi Wang

This paper describes the process of co-designing and creating a voice-driven interactive smart audio descriptive guide for Titanic Belfast, a world-leading tourist attraction. This smart audio descriptive guide is intended to enhance museum accessibility and visitor experience, especially for blind and partially sighted (BPS) visitors. A key research question is to explore to what extent museums can conveniently produce their own smart guide to enrich the visitor experience for BPS visitors. The paper first outlines the necessarily complex set of team functional roles and users in designing the smart audio descriptive guide and then presents the main challenges and opportunities arising from the key user requirements from both BPS visitor and the Museum Management perspectives. The main design features of the smart audio descriptive guide, which address these requirements, are then described. The paper then outlines the main findings of our evaluative review of the smart guide with a group of BPS participants and from a Museum Management perspective. One of the key benefits of our approach is that the smart audio descriptive guide has the potential to offer museums and cultural venues a new, affordable approach to providing and maintaining a high-quality accessibility experience with lower design effort than traditional audio descriptive guide approaches. Lay summary: I developed a voice-driven interactive audio descriptive guide specifically for Titanic Belfast, one of the leading tourist attractions globally. The audio descriptive guides are commonly used in museums to provide verbal description of visual information for blind and partially sighted (BPS) visitors. Most of these guides are keypad based and cannot answer questions from visitors. My innovative guide is designed with the aim to incorporate some new features such as voice driven and chat function, to enhance accessibility and visitor experience, primarily for BPS people. The creation of this guide involved a detailed process where the needs of the BPS visitors and museum management were carefully considered to address various challenges and seize opportunities to enhance accessibility. Key design features of the guide were developed to meet these specific needs, making the museum experience more inclusive. An evaluation conducted with BPS participants has shown some promising results. The new guide not only meets their needs effectively but also offers an affordable solution for museums. This approach reduces the effort and cost typically required to create an audio descriptive guide, presenting a sustainable option that could be adopted by other cultural venues looking to improve accessibility and visitor engagement.

  • Research Article
  • 10.55057/ijbtm.2024.6.2.10
How AI Enhances Retail Experiences?
  • Jun 1, 2024
  • International Journal of Business and Technology Management

Artificial intelligence (AI) is a powerful tool that can significantly enhance customer service and build customer trust, particularly for companies seeking to maintain a competitive edge in the retail business landscape. This study employs a thematic approach to investigate how automating the shopping experience with AI can enhance the retail customer experience. The study utilized random sampling techniques and interviewed 350 respondents at the shopping centre. Thematic content analysis was used to analyse the data. This method identifies interesting themes or patterns in the data and uses them to address research questions. It goes beyond simple data summarisation and involves interpreting and understanding the data. The results indicate that AI enhances the retail experience in various areas, such as customer service, product descriptions, marketing efforts, training, and interactive audio. Face-to-face interviews with retail managers revealed that four themes emerged from the discussion. Further research is required to generalise the findings of this study. This will undoubtedly assist retailers in planning their business to achieve operational efficiency while remaining competitive in the era of AI.

  • Open Access Icon
  • Research Article
  • Cite Count Icon 2
  • 10.1016/j.measen.2024.101155
Application of IoT audio technology based on sensor networks in English speaking teaching system
  • Apr 5, 2024
  • Measurement: Sensors
  • Zhenzhu Wang + 4 more

Application of IoT audio technology based on sensor networks in English speaking teaching system

  • Research Article
  • Cite Count Icon 3
  • 10.1177/13621688241238045
Recasts, foreign language anxiety and L2 development during online mobile-mediated interaction
  • Apr 3, 2024
  • Language Teaching Research
  • Ehsan Rassaei

Despite the wealth of studies on corrective feedback (CF) and its relationship with individual learner factors, little is known about how foreign language (FL) anxiety moderates the effectiveness of recasts during mobile-mediated audio interactions. The present study thus examined the association between learners’ FL anxiety, the effectiveness of recasts, and learners’ responses to recasts during synchronous mobile-mediated interactions via audio call. Two intact classes of EFL (English as a Foreign Language) learners were assigned into a control group and an experimental condition. After taking pre-tests, the participants of the experimental condition participated in four sessions of mobile-mediated oral interaction with an interlocutor via WhatsApp and received recasts for their definite and indefinite article errors. The participants of the control group also participated in the mobile-mediated interactions but received no recasts for their errors. Learners’ improvement was measured on two occasions following the fourth treatment session. The participants’ anxiety was also measured as a continuous variable using a 5-point Likert scale. Mixed between-within group ANCOVA results and regression analysis provided evidence for the efficacy of recasts delivered during mobile-mediated interactions, as well as the significant role of learners’ anxiety as a predictor of the effectiveness of recasts. The results also indicated that learners with low anxiety were significantly more successful in modifying their incorrect forms following recasts compared to learners with higher anxiety during the mobile-mediated interactions.

  • Open Access Icon
  • Research Article
  • Cite Count Icon 1
  • 10.1080/15475441.2024.2313221
Now You See Me, Now You don’t: Children Learn Grammatical Choices During Online Socially Contingent Video and Audio Interactions
  • Mar 31, 2024
  • Language Learning and Development
  • Leone Buckle + 3 more

ABSTRACT Previous research has established that children’s experiences of language during in-person interactions (e.g. individual and cumulative experiences of structural choices) implicitly shape language learning. We investigated whether children also implicitly learn structural choices during online interactions, and whether this is affected by the visual co-presence of a partner. During an online conference call, three- and five-year-olds alternated describing pictures with an experimenter who produced active (“a cat chased the dog”) and passive (“the dog was chased by a cat”) prime descriptions; half the participants had video+audio calls, and half had audio-only. Children in both age groups produced more passives after passive than active primes, both immediately and with accumulating input across trials; neither effect was influenced by call format (video+audio vs audio-only). These results demonstrate that implicit grammar learning mechanisms, as evidenced by syntactic priming effects, operate during socially contingent online interactions. They also highlight the potential of online methodologies for developmental language production research.

  • Research Article
  • 10.55920/2771-019x/1192
Social audio as a tool for public health interventions
  • Jul 14, 2023
  • Journal of Clinical and Medical Images, Case Reports
  • Dominic Arjuna Ugarte

There is a constant need for new approaches and technologies to help with public health study participant recruitment, retainment, and interventions. For approximately 15 years, social media text and image/video platforms (e.g., Facebook, and more recently, Instagram, Snapchat, and TikTok), have been increasingly used to assist with these issues in research. However, a new social media format based on audio interactions (i.e., social audio) is gaining popularity and may have advantages over traditional social media platforms. This paper explores how social audio might be used in public health interventions and how they differ from traditional text- and image/video-based social media platforms.

  • Open Access Icon
  • Abstract
  • 10.1016/j.jval.2023.03.908
EPH26 Barriers to in-Person Focus Group Participation during the Third-Year of COVID-19 Pandemic: A Case Study of Colorectal Cancer (CRC) Screening in Underrepresented Groups
  • Jun 1, 2023
  • Value in Health
  • R Rasu + 7 more

EPH26 Barriers to in-Person Focus Group Participation during the Third-Year of COVID-19 Pandemic: A Case Study of Colorectal Cancer (CRC) Screening in Underrepresented Groups

  • Open Access Icon
  • PDF Download Icon
  • Research Article
  • Cite Count Icon 4
  • 10.3390/fi15020065
Multi-Scale Audio Spectrogram Transformer for Classroom Teaching Interaction Recognition
  • Feb 2, 2023
  • Future Internet
  • Fan Liu + 1 more

Classroom interactivity is one of the important metrics for assessing classrooms, and identifying classroom interactivity through classroom image data is limited by the interference of complex teaching scenarios. However, audio data within the classroom are characterized by significant student–teacher interaction. This study proposes a multi-scale audio spectrogram transformer (MAST) speech scene classification algorithm and constructs a classroom interactive audio dataset to achieve interactive teacher–student recognition in the classroom teaching process. First, the original speech signal is sampled and pre-processed to generate a multi-channel spectrogram, which enhances the representation of features compared with single-channel features; Second, in order to efficiently capture the long-range global context of the audio spectrogram, the audio features are globally modeled by the multi-head self-attention mechanism of MAST, and the feature resolution is reduced during feature extraction to continuously enrich the layer-level features while reducing the model complexity; Finally, a further combination with a time-frequency enrichment module maps the final output to a class feature map, enabling accurate audio category recognition. The experimental comparison of MAST is carried out on the public environment audio dataset and the self-built classroom audio interaction datasets. Compared with the previous state-of-the-art methods on public datasets AudioSet and ESC-50, its accuracy has been improved by 3% and 5%, respectively, and the accuracy of the self-built classroom audio interaction dataset has reached 92.1%. These results demonstrate the effectiveness of MAST in the field of general audio classification and the smart classroom domain.

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • .
  • .
  • 1
  • 2
  • 3
  • 4
  • 5

Popular topics

  • Latest Artificial Intelligence papers
  • Latest Nursing papers
  • Latest Psychology Research papers
  • Latest Sociology Research papers
  • Latest Business Research papers
  • Latest Marketing Research papers
  • Latest Social Research papers
  • Latest Education Research papers
  • Latest Accounting Research papers
  • Latest Mental Health papers
  • Latest Economics papers
  • Latest Education Research papers
  • Latest Climate Change Research papers
  • Latest Mathematics Research papers

Most cited papers

  • Most cited Artificial Intelligence papers
  • Most cited Nursing papers
  • Most cited Psychology Research papers
  • Most cited Sociology Research papers
  • Most cited Business Research papers
  • Most cited Marketing Research papers
  • Most cited Social Research papers
  • Most cited Education Research papers
  • Most cited Accounting Research papers
  • Most cited Mental Health papers
  • Most cited Economics papers
  • Most cited Education Research papers
  • Most cited Climate Change Research papers
  • Most cited Mathematics Research papers

Latest papers from journals

  • Scientific Reports latest papers
  • PLOS ONE latest papers
  • Journal of Clinical Oncology latest papers
  • Nature Communications latest papers
  • BMC Geriatrics latest papers
  • Science of The Total Environment latest papers
  • Medical Physics latest papers
  • Cureus latest papers
  • Cancer Research latest papers
  • Chemosphere latest papers
  • International Journal of Advanced Research in Science latest papers
  • Communication and Technology latest papers

Latest papers from institutions

  • Latest research from French National Centre for Scientific Research
  • Latest research from Chinese Academy of Sciences
  • Latest research from Harvard University
  • Latest research from University of Toronto
  • Latest research from University of Michigan
  • Latest research from University College London
  • Latest research from Stanford University
  • Latest research from The University of Tokyo
  • Latest research from Johns Hopkins University
  • Latest research from University of Washington
  • Latest research from University of Oxford
  • Latest research from University of Cambridge

Popular Collections

  • Research on Reduced Inequalities
  • Research on No Poverty
  • Research on Gender Equality
  • Research on Peace Justice & Strong Institutions
  • Research on Affordable & Clean Energy
  • Research on Quality Education
  • Research on Clean Water & Sanitation
  • Research on COVID-19
  • Research on Monkeypox
  • Research on Medical Specialties
  • Research on Climate Justice
Discovery logo
FacebookTwitterLinkedinInstagram

Download the FREE App

  • Play store Link
  • App store Link
  • Scan QR code to download FREE App

    Scan to download FREE App

  • Google PlayApp Store
FacebookTwitterTwitterInstagram
  • Universities & Institutions
  • Publishers
  • R Discovery PrimeNew
  • Ask R Discovery
  • Blog
  • Accessibility
  • Topics
  • Journals
  • Open Access Papers
  • Year-wise Publications
  • Recently published papers
  • Pre prints
  • Questions
  • FAQs
  • Contact us
Lead the way for us

Your insights are needed to transform us into a better research content provider for researchers.

Share your feedback here.

FacebookTwitterLinkedinInstagram
Cactus Communications logo

Copyright 2025 Cactus Communications. All rights reserved.

Privacy PolicyCookies PolicyTerms of UseCareers