Discovery Logo
Sign In
Search
Paper
Search Paper
R Discovery for Libraries Pricing Sign In
  • Home iconHome
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Literature Review iconLiterature Review NEW
  • Chat PDF iconChat PDF Star Left icon
  • Citation Generator iconCitation Generator
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link
  • Paperpal iconPaperpal
    External link
  • Mind the Graph iconMind the Graph
    External link
  • Journal Finder iconJournal Finder
    External link
Discovery Logo menuClose menu
  • Home iconHome
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Literature Review iconLiterature Review NEW
  • Chat PDF iconChat PDF Star Left icon
  • Citation Generator iconCitation Generator
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link
  • Paperpal iconPaperpal
    External link
  • Mind the Graph iconMind the Graph
    External link
  • Journal Finder iconJournal Finder
    External link
features
  • Audio Papers iconAudio Papers
  • Paper Translation iconPaper Translation
  • Chrome Extension iconChrome Extension
Content Type
  • Journal Articles iconJournal Articles
  • Conference Papers iconConference Papers
  • Preprints iconPreprints
  • Seminars by Cassyni iconSeminars by Cassyni
More
  • R Discovery for Libraries iconR Discovery for Libraries
  • Research Areas iconResearch Areas
  • Topics iconTopics
  • Resources iconResources

Related Topics

  • Convolutional Neural Network Features
  • Convolutional Neural Network Features
  • Mid-level Features
  • Mid-level Features
  • CNN Features
  • CNN Features
  • Mid-level Representation
  • Mid-level Representation
  • Convolutional Features
  • Convolutional Features

Articles published on Fisher vector

Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
321 Search results
Sort by
Recency
  • Research Article
  • 10.1080/01969722.2025.2566665
A Novel Residual-Attention Deep Learning Model for Secure Multimodal Biometric Recognition
  • Sep 27, 2025
  • Cybernetics and Systems
  • Madhumitha Rajendran + 1 more

In contemporary security applications, unimodal biometric systems are prone to spoofing and environmental variability. To overcome these challenges, MBMRAN (Multibiometric Modified Residual Attention Network), an innovative deep learning framework for multimodal biometric identification, is introduced. MBMRAN leverages 1D convolution-based residual blocks combined with a multi-scale encoder-decoder attention mechanism, enhancing feature extraction and adaptive fusion across four biometric sources: face, iris, palmprint, and fingerprint. Unlike traditional approaches, MBMRAN employs a refined attention mask that preserves critical information while amplifying discriminative cues. Feature-level fusion is optimized through Fisher Vector encoding and Gaussian Mixture Models, enabling robust integration of modality-specific features. Evaluated on the CASIA dataset (305 identities), MBMRAN achieved 94.2% accuracy, surpassing classical models (SVM, KNN, RF) and outperforming several pre-trained deep CNNs. A systematic ablation analysis confirms the essential roles of residual layering, skip connections, and attention scaling. While effective on the CASIA dataset, the model’s depth opens future opportunities for optimization and evaluation across diverse biometric benchmarks. The model also achieved high precision, recall, and F1-scores, validating its generalization and reliability. MBMRAN presents a scalable and computationally efficient solution for identity verification, suited for domains such as access control, smart surveillance, and IoT authentication.

  • Research Article
  • 10.1142/s0219519425400706
A NOVEL MULTI-SCALE FEATURE FUSION APPROACH FOR DYSARTHRIA SEVERITY ASSESSMENT: LEVERAGING EMOTIONAL AND INTELLIGIBILITY CUES IN SPEECH
  • Aug 7, 2025
  • Journal of Mechanics in Medicine and Biology
  • Hongmin Lv + 2 more

The assessment of dysarthria severity directly reflects the progression of a patient’s condition and serves as a crucial baseline for developing targeted intervention programs. Emotional characteristics embedded in dysarthric speech not only record the emotional state of patients but also assist clinicians in understanding their mental health status and advancing subsequent treatment. We innovatively designed a multi-scale feature fusion module that integrates speech emotion features with intelligibility characteristics using Fisher vector encoding, thereby enhancing the richness of input features in automated dysarthria assessment systems. In this study, we conducted multiple comparative experiments using different acoustic features and deep learning techniques. The results demonstrate that our multi-scale feature approach achieves an accuracy of 98.56% with the deep neural networks (DNNs) classification model and an impressive 96.13% with the support vector machine (SVM). These findings validate the effectiveness of the multi-scale feature fusion approach in dysarthria severity level assessment and provide new perspectives for the medical diagnosis of dysarthria.

  • Research Article
  • Cite Count Icon 2
  • 10.48084/etasr.10767
Relevance-Aware Content-based Image Retrieval using Deep Hybrid Feature Extraction
  • Jun 4, 2025
  • Engineering, Technology & Applied Science Research
  • Ranjeet Kumar + 1 more

Content-Based Image Retrieval (CBIR) requires balancing feature representation quality, computational efficiency, and robust performance across diverse image domains. Traditional methods lack semantic understanding, whereas deep learning approaches often exclude critical local structural information. This study presents a novel hybrid framework that effectively combines Histogram of Oriented Gradients (HOG) with EfficientNet through a two-stream architecture, enhanced by a query-sensitive co-attention mechanism and Fisher vector encoding. The framework employs an adaptive fusion strategy that dynamically adjusts feature contributions based on the query context and overcomes key limitations of existing approaches. Experimental evaluation on benchmark datasets demonstrates superior performance, achieving mean Average Precision scores of 0.89, 0.85, and 0.83 on Corel-1K, Oxford5K, and Paris6K datasets, respectively, representing a 3-5% improvement over state-of-the-art methods. The framework shows particular effectiveness in handling challenging scenarios such as viewpoint variations and partial occlusions, with landmark queries achieving a Precision@10 of 0.92. Comprehensive ablation studies validate the contribution of each component, with HOG feature integration and attention mechanism improving performance by 4.2% and 3.8%, respectively. The proposed approach successfully bridges the gap between traditional and deep learning methods while maintaining computational efficiency for practical applications.

  • Research Article
  • Cite Count Icon 2
  • 10.1121/10.0036660
Automatic detection of Parkinsonian speech using wavelet scattering features.
  • May 1, 2025
  • JASA express letters
  • Mittapalle Kiran Reddy + 1 more

In this paper, we study the automatic detection of Parkinson's disease (PD) from speech using features computed by a two-layer wavelet scattering network, which generates locally stable and translation-invariant features at each layer. The scattering features are encoded using Fisher vectors to obtain a single fixed-size feature vector per utterance. Support vector machine and feed-forward neural network classifiers are trained using the utterance-level features to perform the detection task (healthy vs PD). The results obtained with the PC-GITA database revealed that the proposed approach shows better results in comparison to the state-of-the-art techniques. The best classification accuracy of 87% was achieved with the proposed approach using speech from a text reading task.

  • Research Article
  • Cite Count Icon 1
  • 10.1109/tim.2025.3565245
An Iterative Method Combining Fuzzy Fusion and Fisher Vectors for Concealed Object Detection in Passive Millimeter-Wave Imaging
  • Jan 1, 2025
  • IEEE Transactions on Instrumentation and Measurement
  • Bo Fang + 7 more

Passive millimeter-wave (PMMW) imaging technology holds potential in security checks by revealing brightness temperature (BT) difference between concealed objects and human body. However, existing region-based methods for detecting objects in PMMW multi-polarization imaging suffer from poor performance of regional segmentation. To address the concerns mentioned, this paper presents an iterative method combining fuzzy fusion and Fisher vectors, named ICFFFV. First, the multipolarization membership degree vectors from contrast images of low BT (CILBTs) are analyzed. This leads to a multi-polarization fuzzy fusion that constructs potential target regions. Next, a new index searching-assisted re-segmentation strategy is introduced from the analyses of inaccurate segmentation and superpixels near the body edge. It mitigates the impact of inaccurate object segmentation and superpixel interference near the body edge. Finally, the SCFV image is combined with the fuzzy fusion image. Following a feedback-based iterative manner, it protects the edge pixels of objects while suppressing clutter pixels. The experiments validate the enhancement performance and the effectiveness of improving detection performance in pixel and region levels.

  • Research Article
  • Cite Count Icon 4
  • 10.1109/tmm.2023.3330338
Steerable Graph Neural Network on Point Clouds via Second-Order Random Walks
  • Jan 1, 2025
  • IEEE Transactions on Multimedia
  • Xianglin Guo + 5 more

Point cloud analysis, arising from computer graphics, remains a fundamental but challenging problem, mainly due to the non-Euclidean property of point cloud data modality. With the snap increase in the amount and breadth of related research in deep learning for graphs, many important works come in the form of graphs representing the point clouds. In this paper, we present a sampling adaptive graph convolutional network that combines the powerful representation ability of random walk subgraph searching and the essential success of the Fisher vector. Extending from those existing graph representation learning or embedding methods with multi-hop neighbor random searching, we sample multi-scale walk fields by using a <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">steerable</i> exploration-exploitation <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">second order random walk</i> , which endows our model with the most flexibility compared with the original first order random walk. To encode each-scale walk field consisting of several walk paths, specifically, we characterize these paths of walk field by Gaussian mixture models (GMMs) so as to better analogize the standard CNNs on Euclidean modality. Each Gaussian component implicitly defines a direction and all of them properly encode the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">spatial layout</i> of walk fields after the gradient projecting to the space of Gaussian parameters, i.e. the Fisher vectors. Thereby, we introduce and name our deep graph convolutional network as <sc xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">PointFisher</small> . Comprehensive evaluations on several public datasets well demonstrate the superiority of our proposed learning method over other state-of-the-arts for point cloud classification and segmentation.

  • Research Article
  • Cite Count Icon 1
  • 10.1109/tgrs.2025.3543174
Supervoxel-Based Instance Segmentation of Pole-Like Facilities From Mobile Laser Scanning Data Using Pyramid Cascaded Fisher Vector Modeling
  • Jan 1, 2025
  • IEEE Transactions on Geoscience and Remote Sensing
  • Longjie Ye + 2 more

Efficient and automatic object recognition in road scenes plays an essential role in smart city applications such as autonomous driving and intelligent infrastructure. As an important component of road scenes, pole-like facilities (PLFs) have been challenging to recognize high-definition road mapping. To achieve the automatic recognition of PLFs from cluttered mobile laser scanning (MLS) data, a novel instance segmentation method is proposed. First, candidate poles are detected by a supervoxel-based histogram analysis from partitioned off-ground point clouds. Then, instance segmentation of PLFs is achieved through a constrained hierarchical region-growing algorithm based on voxelized point clouds. A pyramid cascaded Fisher vector (FV) model and a random forest (RF) classifier are applied to classify the delineated pole-like road facilities into six predefined categories: trees, traffic signs, traffic lights, lamps, bare poles, and other objects. The proposed method is tested on three datasets collected in street scenes with different types of road facilities and point densities. Results demonstrate that our method can effectively achieve instance segmentation of PLFs in complex road environments. The proposed method outperformed state of the art for PLF detection in correctness (93.5%), completeness (95.3%), and quality (88.81%). Besides, the proposed method achieved satisfactory results for instance-level semantic segmentation with an average F1 score of 90.2%, demonstrating the effectiveness of geometric information enhancement in the designed FV coding approach.

  • Research Article
  • 10.1504/ijcse.2025.10070188
Face spoofing detection using noise-based random feature and Fisher vector encoding
  • Jan 1, 2025
  • International Journal of Computational Science and Engineering
  • Na Yang + 7 more

With the vast application of face recognition technology, its security risks have increased as systems are vulnerable to spoofing attacks with falsified faces, attracting many researchers' attention. In this paper, we proposed to make use of noise information in colour space to detect spoofing attacks. Firstly, we extract frame-based noise from face videos in multiple colour spaces. Then local random features are extracted via random projection. Finally, Fisher vector encoding is employed to aggregate these features into global feature vectors, and a classification model is trained for detection. Experimental results on three standard face spoofing databases demonstrate the effectiveness of the approach. The equal error rate on the replay attack database is 0%. On the CASIA and MSU databases, the equal error rates are 3.52% and 0%, respectively. By combining noise-based random features and Fisher vector encoding, this method effectively resists photo, and video-based spoofing attacks.

  • Research Article
  • 10.1504/ijcse.2025.149762
Face spoofing detection using noise-based random feature and Fisher vector encoding
  • Jan 1, 2025
  • International Journal of Computational Science and Engineering
  • Fang Xu + 7 more

With the vast application of face recognition technology, its security risks have increased as systems are vulnerable to spoofing attacks with falsified faces, attracting many researchers' attention. In this paper, we proposed to make use of noise information in colour space to detect spoofing attacks. Firstly, we extract frame-based noise from face videos in multiple colour spaces. Then local random features are extracted via random projection. Finally, Fisher vector encoding is employed to aggregate these features into global feature vectors, and a classification model is trained for detection. Experimental results on three standard face spoofing databases demonstrate the effectiveness of the approach. The equal error rate on the replay attack database is 0%. On the CASIA and MSU databases, the equal error rates are 3.52% and 0%, respectively. By combining noise-based random features and Fisher vector encoding, this method effectively resists photo, and video-based spoofing attacks.

  • Research Article
  • Cite Count Icon 12
  • 10.1016/j.engappai.2024.108291
Composite descriptor based on contour and appearance for plant species identification
  • Mar 18, 2024
  • Engineering Applications of Artificial Intelligence
  • Hao Wu + 3 more

Composite descriptor based on contour and appearance for plant species identification

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 4
  • 10.1371/journal.pone.0298228
Can using a pre-trained deep learning model as the feature extractor in the bag-of-deep-visual-words model always improve image classification accuracy?
  • Feb 29, 2024
  • PLOS ONE
  • Ye Xu + 3 more

This article investigates whether higher classification accuracy can always be achieved by utilizing a pre-trained deep learning model as the feature extractor in the Bag-of-Deep-Visual-Words (BoDVW) classification model, as opposed to directly using the new classification layer of the pre-trained model for classification. Considering the multiple factors related to the feature extractor -such as model architecture, fine-tuning strategy, number of training samples, feature extraction method, and feature encoding method—we investigate these factors through experiments and then provide detailed answers to the question. In our experiments, we use five feature encoding methods: hard-voting, soft-voting, locally constrained linear coding, super vector coding, and fisher vector (FV). We also employ two popular feature extraction methods: one (denoted as Ext-DFs(CP)) uses a convolutional or non-global pooling layer, and another (denoted as Ext-DFs(FC)) uses a fully-connected or global pooling layer. Three pre-trained models—VGGNet-16, ResNext-50(32×4d), and Swin-B—are utilized as feature extractors. Experimental results on six datasets (15-Scenes, TF-Flowers, MIT Indoor-67, COVID-19 CXR, NWPU-RESISC45, and Caltech-101) reveal that compared to using the pre-trained model with only the new classification layer re-trained for classification, employing it as the feature extractor in the BoDVW model improves the accuracy in 35 out of 36 experiments when using FV. With Ext-DFs(CP), the accuracy increases by 0.13% to 8.43% (averaged at 3.11%), and with Ext-DFs(FC), it increases by 1.06% to 14.63% (averaged at 5.66%). Furthermore, when all layers of the pre-trained model are fine-tuned and used as the feature extractor, the results vary depending on the methods used. If FV and Ext-DFs(FC) are used, the accuracy increases by 0.21% to 5.65% (averaged at 1.58%) in 14 out of 18 experiments. Our results suggest that while using a pre-trained deep learning model as the feature extractor does not always improve classification accuracy, it holds great potential as an accuracy improvement technique.

  • Research Article
  • 10.1109/access.2023.3339575
Transfer Learning Models for CNN Fusion With Fisher Vector for Codebook Optimization of Foreground Features
  • Jan 1, 2024
  • IEEE Access
  • Mohamed Gamal M Kamaleldin + 2 more

Human action recognition has become one of the main topics in the computer vision field due to its high demand and competitiveness in real-world applications. The main goals of human action recognition are to improve classification accuracy and reduce computational complexity. Previous studies have mainly used two approaches: the hand-crafted feature extraction approach and the deep learning approach. The hand-crafted approach is simple, which confers it with an added advantage in terms of computational complexity. However, this method is low in accuracy. Conversely, the deep learning approach achieves high accuracy even for complex datasets, but it suffers in terms of computational complexity and long training time as it needs to process huge datasets during training. Other approaches include the use of pre-trained deep learning networks to fuse both methods. In this paper, we will introduce a combination of pre-trained convolutional neural networks (CNN) to extract features, an improved Fisher vector (iFV) codebook, and an optimized support vector machine SVM to achieve improved human action recognition. We leveraged three pre-trained CNNs, namely, Inception-ResNet-v2, NASNet-Large, and Xception, to extract the features. Then, we applied the improved Fisher vector codebook to encode them. We subsequently trained the codebook using SVM for classification and re-adjusted the SVM weights using five different optimization techniques, which are SGD, Adagrad, ADAM, Adamax, and Nadam. To evaluate the performance, we utilized UCF101 and HMDB51 datasets. The results demonstrate that the accuracy and computational complexity of our approach are comparable to state-of-the-art techniques.

  • Open Access Icon
  • Research Article
  • 10.2197/ipsjjip.32.520
Anomaly Detection of Building Structure from Incomplete Point Cloud Obtained by UAV
  • Jan 1, 2024
  • Journal of Information Processing
  • Ayumu Harada + 2 more

Japan has been severely impacted by natural disasters including but not limited to earthquakes such as the Great Hanshin-Awaji Earthquake in 1995 and the Kumamoto Earthquake in 2016. These seismic events have underscored the high number of casualties that result from individuals becoming trapped in collapsed buildings or affected by fires, thereby accentuating the need for building-specific earthquake assessments. Although experts have performed detailed analyses using automated satellite imagery and UAV-captured photos for longer-term objectives such as secondary disaster prevention, reconstruction, and insurance claim verification, these require a substantial amount of time to implement. This paper introduces two methods that employ UAVs to rapidly detect anomalous building structures, allowing for the simultaneous observation of multiple buildings. First, we present a method for identifying building structures from incomplete three-dimensional point clouds acquired by unmanned aerial vehicles (UAVs) that move faster over the target area in a limited time for rapid assessment. The method efficiently identifies the structural characteristics of buildings by operating under certain geometric assumptions such as the angles between building sides being approximately 90 degrees and vertical consistency in building shape. We also present a method for identifying collapsed buildings by extracting features from point clouds in Fisher vector and normal histogram and using a machine-learning model for detection. In our evaluations, we have shown that by limiting observations to less than half of the building structure, the first method can successfully recognize the geometric shape of 70% of the undamaged buildings. In the experiments for the second method, both feature extraction methods achieved a Receiver Operating Characteristic Area Under the Curve (ROC-AUC) values greater than 0.99.

  • Research Article
  • Cite Count Icon 9
  • 10.54216/fpa.160205
Employing Deep Learning Techniques for the Identification and Assessment of Skin Cancer
  • Jan 1, 2024
  • Fusion: Practice and Applications
  • S S + 5 more

These days, skin cancer is a prominent cause of death for people. Skin cancer is the name given to the abnormal development of skin cells that are exposed to the sun. These skin cells can develop anywhere on the human body. The majority of malignancies are treatable in the early stages. Thus, early detection of skin cancer is anticipated in order to preserve patient life. With cutting edge innovation, it is possible to detect skin cancer early on. Here, we provide a novel framework for the recognition of dermo duplication pictures that makes use of a neighbouring descriptor encoding method and deep learning technique. Specifically, the deep representations of a rescaled dermo duplication image that were initially removed through training an extraordinarily deep residual neural network on a big dataset of normal images. Subsequently, the neighbourhood deep descriptors are obtained by request-less visual measurement highlights, which rely on fisher vector encoding to create an international image representation. Lastly, a convolution neural network (CNN) was utilised to orchestrate melanoma images employing the Fisher vector encoded depictions. This proposed technique can give more discriminative parts to oversee huge contrasts inside melanoma classes and little varieties among melanoma and non-melanoma classes with least readiness information.

  • Research Article
  • Cite Count Icon 12
  • 10.1016/j.aej.2023.11.077
Moving scene object tracking method based on deep convolutional neural network
  • Dec 14, 2023
  • Alexandria Engineering Journal
  • Long Liu + 2 more

Moving scene object tracking method based on deep convolutional neural network

  • Open Access Icon
  • Research Article
  • Cite Count Icon 8
  • 10.1145/3596909
Alabib-65: A Realistic Dataset for Algerian Sign Language Recognition
  • Jun 17, 2023
  • ACM Transactions on Asian and Low-Resource Language Information Processing
  • Kenza Khellas + 1 more

Sign language recognition (SLR) is a promising research field that aims to blur boundaries between Deaf and hearing people by creating a system that can transcribe signs into a written or vocal language. There is a growing body of literature that investigates the recognition of different sign languages, especially American sign language. So far, to the best of our knowledge, no study has considered the Algerian SLR. This is mainly due to the lack of datasets. To address this issue, we created the Alabib-65, the first Algerian Sign Language dataset. It consists of up to 6,238 Videos recorded from 41 native signers under realistic settings. This dataset is challenging due to a variety of reasons. First, there is a little inter-class variability. The 65 sign classes are similar in terms of hands’ configuration, placement, or movement and can share the same sub-parts. Second, there is a large intra-class variability. Furthermore, compared to other SL datasets that were collected from an indoor environment with a static and simple background, our videos were recorded from both indoor and outdoor environments with 22 backgrounds varying from simple to cluttered, and from static to dynamic. To underpin future research, we provided baseline results on this new dataset using state-of-the-art machine learning methods, namely: IDTFs with Fisher vector and SVM-classifier, VGG16-GRU, I3D, I3D-GRU, and I3D-GRU-Attention. The results show the validity and the challenges of our dataset.

  • Research Article
  • Cite Count Icon 37
  • 10.1109/tmtt.2022.3230940
Regional-Based Object Detection Using Polarization and Fisher Vectors in Passive Millimeter-Wave Imaging
  • Jun 1, 2023
  • IEEE Transactions on Microwave Theory and Techniques
  • Yayun Cheng + 6 more

Passive millimeter-wave (PMMW) imaging is a powerful approach for detecting hidden objects underneath clothing. The theoretical basis of object detection methods is the contrast of brightness temperature (TB) image. TB differences may be caused by the diversity of material, physical temperature, surface structure, and so on. Existing methods are mainly based on single-polarization and single-pixel processing, which usually generate many discrete pixels of false alarms or missing detections. In this article, we present a regional-based method for hidden object detection using polarization and Fisher vector (FV) features. The necessity of polarization averaging for detection is revealed by theoretical simulation and experimental analyses. Based on the superpixel segmentation of polarization mean image, a modified FV, regional mean FV (RMFV), is created to extract concealed object features. Various imaging experimental data of typical security inspection scenarios are applied to verify the proposed method. The robustness and effectiveness are proved by comparing with several state-of-the-art methods.

  • Research Article
  • Cite Count Icon 10
  • 10.1016/j.compbiomed.2023.107026
Learning binary and sparse permutation-invariant representations for fast and memory efficient whole slide image search
  • May 22, 2023
  • Computers in Biology and Medicine
  • Sobhan Hemati + 3 more

Learning binary and sparse permutation-invariant representations for fast and memory efficient whole slide image search

  • Research Article
  • Cite Count Icon 4
  • 10.1016/j.dsp.2023.104062
Acoustic Scene Classification using Deep Fisher network
  • May 3, 2023
  • Digital Signal Processing
  • Spoorthy Venkatesh + 2 more

Acoustic Scene Classification using Deep Fisher network

  • Open Access Icon
  • Research Article
  • Cite Count Icon 15
  • 10.1109/taffc.2021.3101698
Looking at the Body: Automatic Analysis of Body Gestures and Self-Adaptors in Psychological Distress
  • Apr 1, 2023
  • IEEE Transactions on Affective Computing
  • Weizhe Lin + 4 more

Psychological distress is a significant and growing issue in society. Automatic detection, assessment, and analysis of such distress is an active area of research. Compared to modalities such as face, head, and vocal, research investigating the use of the body modality for these tasks is relatively sparse. This is, in part, due to the limited available datasets and difficulty in automatically extracting useful body features. Recent advances in pose estimation and deep learning have enabled new approaches to this modality and domain. To enable this research, we have collected and analyzed a new dataset containing full body videos for short interviews and self-reported distress labels. We propose a novel method to automatically detect self-adaptors and fidgeting, a subset of self-adaptors that has been shown to be correlated with psychological distress. We perform analysis on statistical body gestures and fidgeting features to explore how distress levels affect participants' behaviors. We then propose a multi-modal approach that combines different feature representations using Multi-modal Deep Denoising Auto-Encoders and Improved Fisher Vector Encoding. We demonstrate that our proposed model, combining audio-visual features with automatically detected fidgeting behavioral cues, can successfully predict distress levels in a dataset labeled with self-reported anxiety and depression levels.

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • .
  • .
  • .
  • 10
  • 1
  • 2
  • 3
  • 4
  • 5

Popular topics

  • Latest Artificial Intelligence papers
  • Latest Nursing papers
  • Latest Psychology Research papers
  • Latest Sociology Research papers
  • Latest Business Research papers
  • Latest Marketing Research papers
  • Latest Social Research papers
  • Latest Education Research papers
  • Latest Accounting Research papers
  • Latest Mental Health papers
  • Latest Economics papers
  • Latest Education Research papers
  • Latest Climate Change Research papers
  • Latest Mathematics Research papers

Most cited papers

  • Most cited Artificial Intelligence papers
  • Most cited Nursing papers
  • Most cited Psychology Research papers
  • Most cited Sociology Research papers
  • Most cited Business Research papers
  • Most cited Marketing Research papers
  • Most cited Social Research papers
  • Most cited Education Research papers
  • Most cited Accounting Research papers
  • Most cited Mental Health papers
  • Most cited Economics papers
  • Most cited Education Research papers
  • Most cited Climate Change Research papers
  • Most cited Mathematics Research papers

Latest papers from journals

  • Scientific Reports latest papers
  • PLOS ONE latest papers
  • Journal of Clinical Oncology latest papers
  • Nature Communications latest papers
  • BMC Geriatrics latest papers
  • Science of The Total Environment latest papers
  • Medical Physics latest papers
  • Cureus latest papers
  • Cancer Research latest papers
  • Chemosphere latest papers
  • International Journal of Advanced Research in Science latest papers
  • Communication and Technology latest papers

Latest papers from institutions

  • Latest research from French National Centre for Scientific Research
  • Latest research from Chinese Academy of Sciences
  • Latest research from Harvard University
  • Latest research from University of Toronto
  • Latest research from University of Michigan
  • Latest research from University College London
  • Latest research from Stanford University
  • Latest research from The University of Tokyo
  • Latest research from Johns Hopkins University
  • Latest research from University of Washington
  • Latest research from University of Oxford
  • Latest research from University of Cambridge

Popular Collections

  • Research on Reduced Inequalities
  • Research on No Poverty
  • Research on Gender Equality
  • Research on Peace Justice & Strong Institutions
  • Research on Affordable & Clean Energy
  • Research on Quality Education
  • Research on Clean Water & Sanitation
  • Research on COVID-19
  • Research on Monkeypox
  • Research on Medical Specialties
  • Research on Climate Justice
Discovery logo
FacebookTwitterLinkedinInstagram

Download the FREE App

  • Play store Link
  • App store Link
  • Scan QR code to download FREE App

    Scan to download FREE App

  • Google PlayApp Store
FacebookTwitterTwitterInstagram
  • Universities & Institutions
  • Publishers
  • R Discovery PrimeNew
  • Ask R Discovery
  • Blog
  • Accessibility
  • Topics
  • Journals
  • Open Access Papers
  • Year-wise Publications
  • Recently published papers
  • Pre prints
  • Questions
  • FAQs
  • Contact us
Lead the way for us

Your insights are needed to transform us into a better research content provider for researchers.

Share your feedback here.

FacebookTwitterLinkedinInstagram
Cactus Communications logo

Copyright 2026 Cactus Communications. All rights reserved.

Privacy PolicyCookies PolicyTerms of UseCareers