Scene Context Research Articles

Occlusion relationship reasoning aims to locate where an object occludes others and estimate the depth order of these objects in three-dimensional (3D) space from a two-dimensional (2D) image. The former sub-task demands both the accurate location and the semantic indication of the objects, while the latter sub-task needs the depth order among the objects. Although several insightful studies have been proposed, a key characteristic of occlusion relationship reasoning, i.e., the specialty and complementarity between occlusion boundary detection and occlusion orientation estimation, is rarely discussed. To verify this claim, in this paper, we integrate these properties into a unified end-to-end trainable network, namely the feature separation and interaction network (FSINet). It contains a shared encoder-decoder structure to learn the complementary property between the two sub-tasks, and two separated paths to learn specialized properties of the two sub-tasks. Concretely, the occlusion boundary path contains an image-level cue extractor to capture rich location information of the boundary, a detail-perceived semantic feature extractor, and a contextual correlation extractor to acquire refined semantic features of objects. In addition, a dual-flow cross detector has been customized to alleviate false-positive and false-negative boundaries. For the occlusion orientation estimation path, a scene context learner has been designed to capture the depth order cue around the boundary. In addition, two stripe convolutions are built to judge the depth order between objects. The shared decoder supplies the feature interaction, which plays a key role in exploiting the complementarity of the two paths. Extensive experimental results on the PIOD and BSDS ownership datasets reveal the superior performance of FSINet over state-of-the-art alternatives. Additionally, abundant ablation studies are offered to demonstrate the effectiveness of our design.

Read full abstract

In this paper, we propose lightweight deep neural networks for Acoustic Scene Classification (ASC) and a visualization method for presenting a sound scene context. To this end, we first propose an inception-based and low-memory footprint ASC model as the ASC baseline. The ASC baseline is then compared with benchmark and high-complexity network architectures. Next, we improve the ASC baseline by proposing a novel deep neural network architecture which leverages a residual-inception architecture and multiple kernels. Given the novel residual-inception (NRI) based model, we apply multiple techniques of model compression to evaluate the trade off between the model complexity and the model accuracy performance. Finally, we evaluate whether sound events detected in a sound scene recording can help to improve ASC accuracy performance and to present the sound scene context more comprehensively. We conduct extensive experiments on various ASC datasets, including sound scene datasets proposed for IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) 2018 Task 1A and 1B, 2019 Task 1A and 1B, 2020 Task 1A, 2021 Task 1A, and 2022 Task 1. Our experimental results on several different ASC challenges highlight two main achievements. First, given the analysis of the trade off between the model performance and the model complexity, we propose two low-complexity ASC models: The medium-size model (MM) presents 4.96 M trainable parameters, 19.3 MB memory occupation, and 7.12 BFLOPs; The small-size model (SM) presents a very low complexity of 120 K trainable parameters, 120 KB memory occupation, and 0.82 BFLOPs. These ASC systems are very competitive to the state-of-the-art systems and compatible for real-life applications on a wide range of edge devices. Secondly, from the analysis of the role of sound events in a sound scene, we propose an effective visualization method for comprehensively presenting a sound scene context. By combining both the sound scene and sound event information, the visualization method not only indicates predicted sound scene contexts with high probabilities but also provides statistics of sound events occurring in these sound scene contexts.

Read full abstract

Scene Context Research Articles

Related Topics

Articles published on Scene Context

In the Eye of Transformer: Global–Local Correlation for Egocentric Gaze Estimation and Beyond

Occlusion relationship reasoning with a feature separation and interaction network

Exploring the Growing Importance of Forensic Geoarchaeology in Italy

A multi-modal vehicle trajectory prediction framework via conditional diffusion model: A coarse-to-fine approach

Viewpoint dependence and scene context effects generalize to depth rotated three-dimensional objects.

Aging attenuates the memory advantage for unexpected objects in real-world scenes

Discriminative target predictor based on temporal-scene attention context enhancement and candidate matching mechanism

Deep Sensing for Compressive Video Acquisition.

Multi-attention network for pedestrian intention prediction based on spatio-temporal feature fusion

Vehicle Trajectory Prediction in Connected Environments via Heterogeneous Context-Aware Graph Convolutional Networks

The influence of scene context on individual and ensemble encoding of object positions

The role of scene context in object recognition by humans and convolutional neural networks

Scene context is predictive of unconstrained object similarity judgments

The Spatial Precision of Contextual Feedback Signals in Human V1.

Language-Assisted 3D Feature Learning for Semantic Scene Understanding

Objects sharpen visual scene representations: evidence from MEG decoding.

Bidirectional Domain Mixup for Domain Adaptive Semantic Segmentation

PDRF: Progressively Deblurring Radiance Field for Fast Scene Reconstruction from Blurry Images

Lightweight deep neural networks for acoustic scene classification and an effective visualization for presenting sound scene contexts

Scene context automatically drives predictions of object transformations

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Scene Context Research Articles

Related Topics

Articles published on Scene Context

In the Eye of Transformer: Global–Local Correlation for Egocentric Gaze Estimation and Beyond

Occlusion relationship reasoning with a feature separation and interaction network

Exploring the Growing Importance of Forensic Geoarchaeology in Italy

A multi-modal vehicle trajectory prediction framework via conditional diffusion model: A coarse-to-fine approach

Viewpoint dependence and scene context effects generalize to depth rotated three-dimensional objects.

Aging attenuates the memory advantage for unexpected objects in real-world scenes

Discriminative target predictor based on temporal-scene attention context enhancement and candidate matching mechanism

Deep Sensing for Compressive Video Acquisition.

Multi-attention network for pedestrian intention prediction based on spatio-temporal feature fusion

Vehicle Trajectory Prediction in Connected Environments via Heterogeneous Context-Aware Graph Convolutional Networks

The influence of scene context on individual and ensemble encoding of object positions

The role of scene context in object recognition by humans and convolutional neural networks

Scene context is predictive of unconstrained object similarity judgments

The Spatial Precision of Contextual Feedback Signals in Human V1.

Language-Assisted 3D Feature Learning for Semantic Scene Understanding

Objects sharpen visual scene representations: evidence from MEG decoding.

Bidirectional Domain Mixup for Domain Adaptive Semantic Segmentation

PDRF: Progressively Deblurring Radiance Field for Fast Scene Reconstruction from Blurry Images

Lightweight deep neural networks for acoustic scene classification and an effective visualization for presenting sound scene contexts

Scene context automatically drives predictions of object transformations