Discovery Logo
Sign In
Search
Paper
Search Paper
R Discovery for Libraries Pricing Sign In
  • Home iconHome
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Literature Review iconLiterature Review NEW
  • Chat PDF iconChat PDF Star Left icon
  • Citation Generator iconCitation Generator
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link
  • Paperpal iconPaperpal
    External link
  • Mind the Graph iconMind the Graph
    External link
  • Journal Finder iconJournal Finder
    External link
Discovery Logo menuClose menu
  • Home iconHome
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Literature Review iconLiterature Review NEW
  • Chat PDF iconChat PDF Star Left icon
  • Citation Generator iconCitation Generator
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link
  • Paperpal iconPaperpal
    External link
  • Mind the Graph iconMind the Graph
    External link
  • Journal Finder iconJournal Finder
    External link
features
  • Audio Papers iconAudio Papers
  • Paper Translation iconPaper Translation
  • Chrome Extension iconChrome Extension
Content Type
  • Journal Articles iconJournal Articles
  • Conference Papers iconConference Papers
  • Preprints iconPreprints
  • Seminars by Cassyni iconSeminars by Cassyni
More
  • R Discovery for Libraries iconR Discovery for Libraries
  • Research Areas iconResearch Areas
  • Topics iconTopics
  • Resources iconResources

Related Topics

  • Audio Events
  • Audio Events

Articles published on Sound event detection

Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
278 Search results
Sort by
Recency
  • New
  • Research Article
  • 10.1016/j.dsp.2026.105993
Sound event localization and detection based on multi-scale attentional feature fusion
  • May 1, 2026
  • Digital Signal Processing
  • Juan Wei + 2 more

Sound event localization and detection based on multi-scale attentional feature fusion

  • Research Article
  • 10.1016/j.trd.2026.105241
Sound event detection for modified-exhaust vehicles in urban environment
  • Apr 1, 2026
  • Transportation Research Part D: Transport and Environment
  • Zirun Wang + 6 more

Sound event detection for modified-exhaust vehicles in urban environment

  • Research Article
  • 10.1016/j.neucom.2026.132880
A Dual Consistency Training (DCT) strategy for polyphonic sound event detection
  • Apr 1, 2026
  • Neurocomputing
  • Qiong Wu + 4 more

A Dual Consistency Training (DCT) strategy for polyphonic sound event detection

  • Research Article
  • 10.1109/jbhi.2026.3679603
From Neural Representations to Brain-Inspired Architecture: Decoding Auditory Target Perception in Complex Scenes.
  • Mar 31, 2026
  • IEEE journal of biomedical and health informatics
  • Jianting Shi + 5 more

In complex acoustic scenarios, the human auditory system excels at rapidly and accurately identifying target sounds. A deeper understanding of its mechanisms in such environments could significantly enhance the robustness and generalization capabilities of sound event detection systems. This paper investigates the neural representations and decoding of target sound perception in complex auditory scenes using non-invasive neural signals. We propose a novel experimental paradigm that simulates complex acoustic conditions by varying parameters such as target sound characteristics, background noise levels, and interfering events. Multi-view neural representations- including time, frequency, and source domains-are extracted and analyzed using statistical methods to examine their relationships with these variables. To decode these neural activities, we propose the Auditory Cortex-inspired Dual Attention Network (AC-DANet), an architecture functionally inspired by known attentional pathways in the auditory cortex. The model achieves robust three-class Electroencephalogram (EEG) decoding for target sound perception in challenging auditory scenes, with experimental results demonstrating strong performance and cross subject generalization. This study advances our understanding of the neural information transmission process underlying sound target perception in complex acoustic environments. It offers novel insights into the cognitive functions of the human auditory system, while providing a theoretical foundation and technical framework for the development of advanced sound event detection systems in challenging acoustic settings.

  • Research Article
  • 10.1038/s41598-026-47018-3
Deep learning-based detection of bowel sound events in continuous recordings.
  • Mar 30, 2026
  • Scientific reports
  • Yusuf Çelik

Deep learning-based detection of bowel sound events in continuous recordings.

  • Research Article
  • 10.3390/bdcc10030083
Sound Event Detection in Smart Cities: A Systematic Review of Methods, Datasets, and Applications
  • Mar 8, 2026
  • Big Data and Cognitive Computing
  • Giuseppe Ciaburro + 1 more

Sound Event Detection (SED) is a growing area with vast prospects for understanding and designing the sonic fabric of smart cities. In this paper, the latest advances in SED are summarized, focusing on models, datasets, and applications from scientific papers listed on Scopus and Web of Science. The paper provides a clear view of how SED is being used in smart cities, public safety, environment monitoring, and home security. The paper also addresses the challenges of SED, including dataset representativeness, model robustness under noisy or complex acoustic scenes, event rarity detection, as well as the ethics of using automatic listening. The paper also provides a view of future work to be undertaken in SED. The focus of the paper is on self-supervised learning, multi-modal fusion, neuro-inspired approaches, as well as privacy-preserving analytics. The paper provides a view of SED as a key technology to make smart cities safe, secure, and sustainable. SED has vast prospects as a key technology to enable artificial perception of smart cities.

  • Research Article
  • 10.1016/j.measurement.2026.120297
LSKFDY-CNN: Large selective kernel frequency dynamic convolutional neural network for sound event detection
  • Mar 1, 2026
  • Measurement
  • Zhuangzhuang Wang + 4 more

LSKFDY-CNN: Large selective kernel frequency dynamic convolutional neural network for sound event detection

  • Research Article
  • Cite Count Icon 1
  • 10.1016/j.bspc.2025.108491
EZhouNet: A framework based on graph neural network and anchor interval for the respiratory sound event detection
  • Feb 1, 2026
  • Biomedical Signal Processing and Control
  • Yun Chu + 4 more

EZhouNet: A framework based on graph neural network and anchor interval for the respiratory sound event detection

  • Research Article
  • 10.1109/access.2026.3671264
Sound Event Detection System With Frequency-Aware Enhancements and Semi-Supervised Learning
  • Jan 1, 2026
  • IEEE Access
  • Narin Kim + 4 more

Sound Event Detection (SED) systems are essential for understanding and classifying the causes and temporal occurrences of sounds in diverse environments. This paper introduces a robust and efficient SED system that integrates a novel Frequency-aware Lightweight Convolutional Attention Module (FLCAM) and semi-supervised learning techniques to address key challenges in audio analysis. The FLCAM enhances 2D convolutional models by preserving critical frequency-domain features and adaptively assigning attention weights, enabling superior performance while maintaining computational efficiency. To fully leverage strongly labeled, weakly labeled, and unlabeled data, our system employs the Mean Teacher framework, which ensures consistency between predictions under different augmentations. Comprehensive experiments on the DESED and L3DAS22 datasets demonstrate the effectiveness of our approach, achieving improvements of approximately 9 percentage points in PSDS and 2 percentage points in F-score metrics, respectively. Despite utilizing significantly fewer parameters, the proposed SED system achieves performance comparable to state-of-the-art models, making it suitable for real-world applications, including resource-constrained environments.

  • Research Article
  • 10.1109/taslpro.2026.3677672
HDA-SELD: Hierarchical Cross-Modal Distillation With Multi-Level Data Augmentation for Low-Resource Audio-Visual Sound Event Localization and Detection
  • Jan 1, 2026
  • IEEE Transactions on Audio, Speech and Language Processing
  • Qing Wang + 5 more

This work presents HDA-SELD, a unified hierarchical distillation and augmentation framework for audio-visual (AV) sound event localization and detection (SELD) designed to address the challenge of data scarcity. The proposed framework integrates hierarchical cross-modal distillation (HCMD) to transfer knowledge from a robust audio-only SELD teacher to an AV student through both output responses and intermediate hidden representations. To enhance learning, we introduce a multi-level data augmentation strategy that mixes features randomly selected from multiple network layers and associated loss functions tailored to the SELD task. By employing loss interpolation instead of direct label manipulation, the strategy ensures spatial consistency during the augmentation process. Extensive experiments on the DCASE 2023 and 2024 Challenge SELD datasets show that the proposed method significantly improves AV SELD performance, yielding relative gains of 21%-38% in the overall metric over the baselines. Notably, our proposed HDA-SELD achieves results comparable to or better than teacher models trained on much larger datasets, surpassing state-of-the-art methods on both DCASE 2023 and 2024 Challenge SELD tasks.

  • Research Article
  • 10.1109/lsp.2026.3685150
Reverberation-based Features for Sound Event Localization and Detection with Distance Estimation
  • Jan 1, 2026
  • IEEE Signal Processing Letters
  • Davide Berghi + 1 more

Reverberation-based Features for Sound Event Localization and Detection with Distance Estimation

  • Research Article
  • Cite Count Icon 2
  • 10.1016/j.fss.2025.109638
Modeling uncertainty with interval-valued type-2 fuzzy sets: Application to anomalous sound event detection
  • Jan 1, 2026
  • Fuzzy Sets and Systems
  • Zied Mnasri + 2 more

Modeling uncertainty with interval-valued type-2 fuzzy sets: Application to anomalous sound event detection

  • Research Article
  • 10.3390/app16010205
YOLO-Based Transfer Learning for Sound Event Detection Using Visual Object Detection Techniques
  • Dec 24, 2025
  • Applied Sciences
  • Sergio Segovia González + 2 more

Traditional Sound Event Detection (SED) approaches are based on either specialized models or these models in combination with general audio embedding extractors. In this article, we propose to reframe SED as an object detection task in the time–frequency plane and introduce a direct adaptation of modern YOLO detectors to audio. To our knowledge, this is among the first works to employ YOLOv8 and YOLOv11 not merely as feature extractors but as end-to-end models that localize and classify sound events on mel-spectrograms. Methodologically, our approach (i) generates mel-spectrograms on the fly from raw audio to streamline the pipeline and enable transfer learning from vision models; (ii) applies curriculum learning that exposes the detector to progressively more complex mixtures, improving robustness to overlaps; and (iii) augments training with synthetic audio constructed under DCASE 2023 guidelines to enrich rare classes and challenging scenarios. Comprehensive experiments compare our YOLO-based framework against strong CRNN and Conformer baselines. In our experiments on the DCASE-style setting, the method achieves competitive detection accuracy relative to CRNN and Conformer baselines, with gains in some overlapping/noisy conditions and shortcomings for several short-duration classes. These results suggest that adapting modern object detectors to audio can be effective in this setting, while broader generalization and encoder-augmented comparisons remain open.

  • Research Article
  • 10.3390/math13243948
Sound Event Detection Employing Segmental Model
  • Dec 11, 2025
  • Mathematics
  • Yong-Joo Chung

Segmental models compute likelihood scores in segment units instead of frame units to recognize sequence data. Motivated by some promising results in speech recognition and natural language processing, we apply segmental models to sound event detection for the first time and verify their effectiveness compared to the conventional frame-based approaches. The proposed model processes variable-length segments of sound signals by encoding feature vectors employing deep learning techniques. These encoded vectors are subsequently embedded to derive representative values for each segment, which are then scored to identify the best matches for each input sound signal. Owing to the inherent variation in lengths and types of input sound signals, segmental models incur high computational and memory costs. To address this issue, a simple segment-scoring function with efficient computation and memory usage is employed in our end-to-end model. We use marginal log loss as the cost function while training the segment model, which eliminates the reliance on strong labels for sound events. Experiments performed on the detection and classification of acoustic scenes and events challenge 2019 dataset reveal that the proposed method achieves a better F-score in sound event detection compared with conventional convolutional recurrent neural network-based models.

  • Research Article
  • 10.3844/jcssp.2025.2772.2801
Preventing Deforestation in the Indian Landscape Through Neural Network-Based Intelligence Using Sound Event Detection and Advanced Feature Extraction Techniques
  • Dec 1, 2025
  • Journal of Computer Science
  • Sallauddin Mohmmad + 1 more

Forests play a vital role in maintaining ecological balance, regulating the climate, and conserving biodiversity. However, India’s forest landscape has witnessed significant changes between 1980 and 2024 due to deforestation, afforestation, and evolving conservation strategies. To address the challenges associated with forest monitoring, we proposed a model based on Sound Event Detection using a dataset comprising four classes: chainsaw sounds, handsaw sounds, axe-cutting sounds (synthetic), and negative environmental sounds (e.g., birds, animals, wind). The dataset was constructed from publicly available resources, except for the axe-cutting sound class, which was prepared synthetically. The model employed six feature extraction techniques Mel-Spectrogram, Mel-Frequency Cepstral Coefficients (MFCC), Chroma, Spectral Contrast, Tonnetz, and Spectral Bandwidth to capture critical audio characteristics. These features enabled the efficient representation of harmonic content, temporal patterns, and timbre, which were essential for distinguishing between classes. The proposed approach was executed using various deep learning models, including Customized 1D Convolutional Neural Networks (CNN), Bi-directional Convolutional Recurrent Neural Networks (Bi-CRNN), Bi-directional Gated Recurrent Unit-based CRNNs (Bi-GRU-CRNN), AlexNet, and ResNet. The Customized-CNN, implemented using Keras, demonstrated superior performance with an accuracy of 98%. The model’s effectiveness was further validated as accuracy increased progressively from 95 to 98% when transitioning from two to six feature extraction clusters.

  • Research Article
  • 10.55041/ijsrem54276
Neural Network Architectures for Extracting Meaningful Representations from Audio Data
  • Nov 21, 2025
  • International Journal of Scientific Research in Engineering and Management
  • M Shubha + 3 more

Abstract— Audio data carries rich information in the form of speech, music, and environmental sounds, but its raw waveform is often complex and high-dimensional, making direct analysis difficult. Neural network architectures have emerged as powerful tools for extracting meaningful representations from audio signals, enabling efficient analysis and interpretation. Recurrent Neural Networks (RNNs), and Transformer-based models— for learning robust and discriminative features from audio. By automatically capturing temporal, spectral, and contextual patterns, these architectures significantly improve performance in tasks such as speech recognition, speaker identification, music classification, and environmental sound detection. The findings highlight the potential of neural networks to replace traditional handcrafted features, thereby advancing the development of scalable, accurate, and realtime audio processing applications. The rapid growth of audio data across domains such as speech, music, healthcare, and environmental monitoring has created a strong need for effective methods to extract meaningful representations from complex audio signals. Traditional approaches rely on handcrafted features like MFCCs and spectrogram descriptors, which often fail to capture the full temporal and spectral dynamics present in raw audio. Keywords: Neural Networks, Deep Learning, Audio Representation Learning, Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Transformer Models, Feature Extraction, Speech Recognition, Speaker Identification, Sound Event Detection, Spectrogram Analysis, Audio Signal Processing

  • Research Article
  • 10.1007/s00034-025-03409-x
An Interactive Feature Aggregation and Perception Network for 3D Sound Event Localization and Detection
  • Nov 15, 2025
  • Circuits, Systems, and Signal Processing
  • Yongbo Li + 1 more

An Interactive Feature Aggregation and Perception Network for 3D Sound Event Localization and Detection

  • Research Article
  • Cite Count Icon 4
  • 10.1109/tpami.2025.3593932
UniAV: Unified Audio-Visual Perception for Multi-Task Video Event Localization.
  • Nov 1, 2025
  • IEEE transactions on pattern analysis and machine intelligence
  • Tiantian Geng + 6 more

Video event localization tasks include temporal action localization (TAL), sound event detection (SED) and audio-visual event localization (AVEL). Existing methods tend to over-specialize on individual tasks, neglecting the equal importance of these different events for a complete understanding of video content. In this work, we aim to develop a unified framework to solve TAL, SED and AVEL tasks together to facilitate holistic video understanding. However, it is challenging since different tasks emphasize distinct event characteristics and there are substantial disparities in existing task-specific datasets (size/domain/duration). It leads to unsatisfactory results when applying a naive multi-task strategy. To tackle the problem, we introduce UniAV, a Unified Audio-Visual perception network to effectively learn and share mutually beneficial knowledge across tasks and modalities. Concretely, we propose a unified audio-visual encoder to derive generic representations from multiple temporal scales for videos from all tasks. Meanwhile, task-specific experts are designed to capture the unique knowledge specific to each task. Besides, instead of using separate prediction heads, we develop a novel unified language-aware classifier by utilizing semantic-aligned task prompts, enabling our model to flexibly localize various instances across tasks with an impressive open-set ability to localize novel categories. Extensive experiments demonstrate that UniAV, with its unified architecture, significantly outperforms both single-task models and the naive multi-task baseline across all three tasks. It achieves superior or on-par performances compared to the state-of-the-art task-specific methods on ActivityNet 1.3, DESED and UnAV-100 benchmarks.

  • Research Article
  • Cite Count Icon 1
  • 10.1038/s41597-025-05991-w
Environmental Noise Dataset for Sound Event Classification and Detection.
  • Oct 29, 2025
  • Scientific data
  • Luca Fredianelli + 5 more

Sound Event Classification (SEC) and Sound Event Detection (SED) are gaining momentum across various domains. With the rise of machine learning, identifying specific sound sources amid background noise in outdoor environments has become a major focus. Recognizable sound types are many and vary depending on context, ranging from vehicles, trains, and aircraft to human and animal activity. This work introduces two open-access datasets, DataSEC and DataSED, created to address gaps identified in the existing dataset literature. Together, they provide over 35 hours of authentic, non-synthesized.wav audio, collected from sound level meter measurements and online repositories. DataSEC consists of 4292 audio samples, with each sample representing a single event that has been classified into one of 22 defined sound classes and 28 subclasses. DataSED comprises 712 real-world recordings containing multiple events, accompanied by over 4000 labels provided in .csv format. These datasets extend across a range of urban to rural environments and have been designed to support research in real-world sound event classification and automated analysis of environmental noise.

  • Research Article
  • 10.3390/s25216505
Human–Machine Collaborative Learning for Streaming Data-Driven Scenarios
  • Oct 22, 2025
  • Sensors (Basel, Switzerland)
  • Fan Yang + 2 more

Deep learning has been broadly applied in many fields and has greatly improved efficiency compared to traditional approaches. However, it cannot resolve issues well when there are a lack of training samples, or in some varying cases, it cannot give a clear output. Human beings and machines that work in a collaborative and equal mode to address complicated streaming data-driven tasks can achieve higher accuracy and clearer explanations. A novel framework is proposed which integrates human intelligence and machine intelligent computing, taking advantage of both strengths to work out complex tasks. Human beings are responsible for the highly decisive aspects of the task and provide empirical feedback to the model, whereas the machines undertake the repetitive computing aspects of the task. The framework will be executed in a flexible way through interactive human–machine cooperation mode, while it will be more robust for some hard samples recognition. We tested the framework using video anomaly detection, person re-identification, and sound event detection application scenarios, and we found that the human–machine collaborative learning mechanism obtained much better accuracy. After fusing human knowledge with deep learning processing, the final decision making is confirmed. In addition, we conducted abundant experiments to verify the effectiveness of the framework and obtained the competitive performance at the cost of a small amount of human intervention. The approach is a new form of machine learning, especially in dynamic and untrustworthy conditions.

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • .
  • .
  • .
  • 10
  • 1
  • 2
  • 3
  • 4
  • 5

Popular topics

  • Latest Artificial Intelligence papers
  • Latest Nursing papers
  • Latest Psychology Research papers
  • Latest Sociology Research papers
  • Latest Business Research papers
  • Latest Marketing Research papers
  • Latest Social Research papers
  • Latest Education Research papers
  • Latest Accounting Research papers
  • Latest Mental Health papers
  • Latest Economics papers
  • Latest Education Research papers
  • Latest Climate Change Research papers
  • Latest Mathematics Research papers

Most cited papers

  • Most cited Artificial Intelligence papers
  • Most cited Nursing papers
  • Most cited Psychology Research papers
  • Most cited Sociology Research papers
  • Most cited Business Research papers
  • Most cited Marketing Research papers
  • Most cited Social Research papers
  • Most cited Education Research papers
  • Most cited Accounting Research papers
  • Most cited Mental Health papers
  • Most cited Economics papers
  • Most cited Education Research papers
  • Most cited Climate Change Research papers
  • Most cited Mathematics Research papers

Latest papers from journals

  • Scientific Reports latest papers
  • PLOS ONE latest papers
  • Journal of Clinical Oncology latest papers
  • Nature Communications latest papers
  • BMC Geriatrics latest papers
  • Science of The Total Environment latest papers
  • Medical Physics latest papers
  • Cureus latest papers
  • Cancer Research latest papers
  • Chemosphere latest papers
  • International Journal of Advanced Research in Science latest papers
  • Communication and Technology latest papers

Latest papers from institutions

  • Latest research from French National Centre for Scientific Research
  • Latest research from Chinese Academy of Sciences
  • Latest research from Harvard University
  • Latest research from University of Toronto
  • Latest research from University of Michigan
  • Latest research from University College London
  • Latest research from Stanford University
  • Latest research from The University of Tokyo
  • Latest research from Johns Hopkins University
  • Latest research from University of Washington
  • Latest research from University of Oxford
  • Latest research from University of Cambridge

Popular Collections

  • Research on Reduced Inequalities
  • Research on No Poverty
  • Research on Gender Equality
  • Research on Peace Justice & Strong Institutions
  • Research on Affordable & Clean Energy
  • Research on Quality Education
  • Research on Clean Water & Sanitation
  • Research on COVID-19
  • Research on Monkeypox
  • Research on Medical Specialties
  • Research on Climate Justice
Discovery logo
FacebookTwitterLinkedinInstagram

Download the FREE App

  • Play store Link
  • App store Link
  • Scan QR code to download FREE App

    Scan to download FREE App

  • Google PlayApp Store
FacebookTwitterTwitterInstagram
  • Universities & Institutions
  • Publishers
  • R Discovery PrimeNew
  • Ask R Discovery
  • Blog
  • Accessibility
  • Topics
  • Journals
  • Open Access Papers
  • Year-wise Publications
  • Recently published papers
  • Pre prints
  • Questions
  • FAQs
  • Contact us
Lead the way for us

Your insights are needed to transform us into a better research content provider for researchers.

Share your feedback here.

FacebookTwitterLinkedinInstagram
Cactus Communications logo

Copyright 2026 Cactus Communications. All rights reserved.

Privacy PolicyCookies PolicyTerms of UseCareers