WheelPoser: Sparse-IMU Based Body Pose Estimation for Wheelchair Users

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Despite researchers having extensively studied various ways to track body pose on-the-go, most prior work does not take into account wheelchair users, leading to poor tracking performance. Wheelchair users could greatly benefit from this pose information to prevent injuries, monitor their health, identify environmental accessibility barriers, and interact with gaming and VR experiences. In this work, we present WheelPoser, a real-time pose estimation system specifically designed for wheelchair users. Our system uses only four strategically placed IMUs on the user’s body and wheelchair, making it far more practical than prior systems using cameras and dense IMU arrays. WheelPoser is able to track a wheelchair user’s pose with a mean joint angle error of 14.30° and a mean joint position error of 6.74 cm, more than three times better than similar systems using sparse IMUs. To train our system, we collect a novel WheelPoser-IMU dataset, consisting of 167 minutes of paired IMU sensor and motion capture data of people in wheelchairs, including wheelchair-specific motions such as propulsion and pressure relief. Finally, we explore the potential application space enabled by our system and discuss future opportunities. Open-source code, models, and dataset can be found here: https://github.com/axle-lab/WheelPoser.

Similar Papers
  • Research Article
  • Cite Count Icon 29
  • 10.1016/j.heliyon.2024.e36589
Deep learning-based human body pose estimation in providing feedback for physical movement: A review
  • Aug 26, 2024
  • Heliyon
  • Atima Tharatipyakul + 2 more

Deep learning-based human body pose estimation in providing feedback for physical movement: A review

  • Research Article
  • Cite Count Icon 45
  • 10.1007/s11263-008-0158-0
Learning Generative Models for Multi-Activity Body Pose Estimation
  • Jul 31, 2008
  • International Journal of Computer Vision
  • Tobias Jaeggli + 2 more

We present a method to simultaneously estimate 3D body pose and action categories from monocular video sequences. Our approach learns a generative model of the relationship of body pose and image appearance using a sparse kernel regressor. Body poses are modelled on a low-dimensional manifold obtained by Locally Linear Embedding dimensionality reduction. In addition, we learn a prior model of likely body poses and a dynamical model in this pose manifold. Sparse kernel regressors capture the nonlinearities of this mapping efficiently. Within a Recursive Bayesian Sampling framework, the potentially multimodal posterior probability distributions can then be inferred. An activity-switching mechanism based on learned transfer functions allows for inference of the performed activity class, along with the estimation of body pose and 2D image location of the subject. Using a rough foreground segmentation, we compare Binary PCA and distance transforms to encode the appearance. As a postprocessing step, the globally optimal trajectory through the entire sequence is estimated, yielding a single pose estimate per frame that is consistent throughout the sequence. We evaluate the algorithm on challenging sequences with subjects that are alternating between running and walking movements. Our experiments show how the dynamical model helps to track through poorly segmented low-resolution image sequences where tracking otherwise fails, while at the same time reliably classifying the activity type.

  • Research Article
  • Cite Count Icon 42
  • 10.1162/evco.2008.16.4.509
Human Body Pose Estimation with Particle Swarm Optimisation
  • Dec 1, 2008
  • Evolutionary Computation
  • Špela Ivekovič + 2 more

In this paper we address the problem of human body pose estimation from still images. A multi-view set of images of a person sitting at a table is acquired and the pose estimated. Reliable and efficient pose estimation from still images represents an important part of more complex algorithms, such as tracking human body pose in a video sequence, where it can be used to automatically initialise the tracker on the first frame. The quality of the initialisation influences the performance of the tracker in the subsequent frames. We formulate the body pose estimation as an analysis-by-synthesis optimisation algorithm, where a generic 3D human body model is used to illustrate the pose and the silhouettes extracted from the images are used as constraints. A simple test with gradient descent optimisation run from randomly selected initial positions in the search space shows that a more powerful optimisation method is required. We investigate the suitability of the Particle Swarm Optimisation (PSO) for solving this problem and compare its performance with an equivalent algorithm using Simulated Annealing (SA). Our tests show that the PSO outperforms the SA in terms of accuracy and consistency of the results, as well as speed of convergence.

  • Conference Article
  • Cite Count Icon 4
  • 10.1109/sitis57111.2022.00021
In-Pose Estimation of Covered and Uncovered Human Body from Thermal Camera Images Using Multi-Scale Stacked Hourglass (MSSHg) Network
  • Oct 1, 2022
  • Sahereh Obeidavi + 2 more

Estimating the pose of the human body lying on the bed has enormous and valuable benefits in various applications such as medicine and healthcare. Regular and long-term monitoring and detection of human poses in bed need sustaining certain poses, as they contribute to better recovery after certain surgeries or to control the symptoms effects of many complications. Particularly, when the body is fully covered or in completely dark ambient, existing methods for in-bed pose monitoring should be upgraded or new methods should be developed. An economical, contactless, vision-based system that is not sensitive to both challenges of darkness and cover is a thermal camera imaging system using the long-wavelength IR technique. In this regard, the dataset used in this study is the SLP data set, in which several different modes of poses with various covers are collected simultaneously. This dataset is a thermal camera-based dataset and is entirely annotated as an in-bed pose dataset. In this paper, the multi-scale stacked Hourglass (MSSHg) network, has been applied to improve the processing of thermal camera images for human pose estimation. Drawing on the concept of multi-scale, the aim of using the pre-processing network in this model is to extract feature maps with different scales and assign them to different stacked Hourglass networks. The performance of this approach for in-bed pose estimation showed progress with the result accuracy of 96.8% in PCK0.2 standard. In comparison with stacked Hourglass network results, the MSSHg network has about 0.8% more accuracy for covered body pose estimation and as much as or even higher accuracy for uncovered body pose estimation at the final stage. Additionally, it converges faster and easier at the beginning of the training stage.

  • Research Article
  • 10.1049/iet-cvi.2015.0283
Contextualised learning‐free three‐dimensional body pose estimation from two‐dimensional body features in monocular images
  • Apr 19, 2016
  • IET Computer Vision
  • Luis Unzueta + 4 more

In this study, the authors present a learning‐free method for inferring kinematically plausible three‐dimensional (3D) human body poses contextualised in a predefined 3D world, given a set of 2D body features extracted from monocular images. This contextualisation has the advantage of providing further semantic information about the observed scene. Their method consists of two main steps. Initially, the camera parameters are obtained by adjusting the reference floor of the predefined 3D world to four key‐points in the image. Then, the person's body part lengths and pose are estimated by fitting a parametrised multi‐body 3D kinematic model to 2D image body features, which can be located by state‐of‐the‐art body part detectors. The adjustment is carried out by a hierarchical optimisation procedure, where the model's scale variations are considered first and then the body part lengths are refined. At each iteration, tentative poses are inferred by a combination of efficient perspective‐n‐point camera pose estimation and constrained viewpoint‐dependent inverse kinematics. Experimental results show that their method obtains good results in terms of accuracy with respect to state‐of‐the‐art alternatives, but without the need of learning 2D/3D mapping models from training data. Their method works efficiently, allowing its integration in video soft sensing systems.

  • Research Article
  • Cite Count Icon 11
  • 10.3390/data7060079
UNIPD-BPE: Synchronized RGB-D and Inertial Data for Multimodal Body Pose Estimation and Tracking
  • Jun 9, 2022
  • Data
  • Mattia Guidolin + 2 more

The ability to estimate human motion without requiring any external on-body sensor or marker is of paramount importance in a variety of fields, ranging from human–robot interaction, Industry 4.0, surveillance, and telerehabilitation. The recent development of portable, low-cost RGB-D cameras pushed forward the accuracy of markerless motion capture systems. However, despite the widespread use of such sensors, a dataset including complex scenes with multiple interacting people, recorded with a calibrated network of RGB-D cameras and an external system for assessing the pose estimation accuracy, is still missing. This paper presents the University of Padova Body Pose Estimation dataset (UNIPD-BPE), an extensive dataset for multi-sensor body pose estimation containing both single-person and multi-person sequences with up to 4 interacting people. A network with 5 Microsoft Azure Kinect RGB-D cameras is exploited to record synchronized high-definition RGB and depth data of the scene from multiple viewpoints, as well as to estimate the subjects’ poses using the Azure Kinect Body Tracking SDK. Simultaneously, full-body Xsens MVN Awinda inertial suits allow obtaining accurate poses and anatomical joint angles, while also providing raw data from the 17 IMUs required by each suit. This dataset aims to push forward the development and validation of multi-camera markerless body pose estimation and tracking algorithms, as well as multimodal approaches focused on merging visual and inertial data.

  • Book Chapter
  • 10.1007/978-3-540-72847-4_27
Rao-Blackwellized Particle Filter for Human Appearance and Position Tracking
  • Jun 6, 2007
  • Jesús Martínez-Del-Rincón + 2 more

In human motion analysis, the joint estimation of appearance, body pose and location parameters is not always tractable due to its huge computational cost. In this paper, we propose a Rao-Blackwellized Particle Filter for addressing the problem of human pose estimation and tracking. The advantage of the proposed approach is that Rao-Blackwellization allows the state variables to be splitted into two sets, being one of them analytically calculated from the posterior probability of the remaining ones. This procedure reduces the dimensionality of the Particle Filter, thus requiring fewer particles to achieve a similar tracking performance. In this manner, location and size over the image are obtained stochastically using colour and motion clues, whereas body pose is solved analytically applying learned human Point Distribution Models.

  • Conference Article
  • Cite Count Icon 1
  • 10.1145/2072572.2072595
3D perceptual shape feature-based body parts classification and pose estimation
  • Dec 1, 2011
  • Gang Hu + 1 more

Human body motion and gesture analysis has been boosted by the latest developments of 3D cameras and the high demands of emerging applications. Body parts classification and pose estimation are essential for the human body tracking and motion recognition. In this poster, we present a 3D perceptual shape feature-based approach for efficient body parts classification and pose estimation. The contribution of this work is twofold: 1) by utilizing 3D image features and kinematic constraints, the classification task can be efficiently performed without huge training data and costly learning process; 2) by applying the classification results, complexity of body pose estimation can be significantly reduced. Experimental results demonstrate the system performance, and exhibit the potential for complex body pose estimation and tracking.

  • Book Chapter
  • Cite Count Icon 6
  • 10.1007/978-3-319-48881-3_20
Combining Human Body Shape and Pose Estimation for Robust Upper Body Tracking Using a Depth Sensor
  • Jan 1, 2016
  • Thomas Probst + 2 more

Rapid and accurate estimation of a person’s upper body shape and real-time tracking of the pose in the presence of occlusions is crucial for many future assistive technologies, health care applications and telemedicine systems. We propose to tackle this challenging problem by combining data-driven and generative methods for both body shape and pose estimation. Our strategy comprises a subspace-based method to predict body shape directly from a single depth map input, and a random forest regression approach to obtain a sound initialization for pose estimation of the upper body. We propose a model-fitting strategy in order to refine the estimated body shape and to exploit body shape information for improving pose accuracy. During tracking, we feed refinement results back into the forest-based joint position regressor to stabilize and accelerate pose estimation over time. Our tracking framework is designed to cope with viewpoint limitations and occlusions due to dynamic objects.

  • Conference Article
  • Cite Count Icon 4
  • 10.1109/dsaa49011.2020.00033
Body Pose and Deep Hand-shape Feature Based American Sign Language Recognition
  • Oct 1, 2020
  • Al Amin Hosain + 4 more

This work presents an approach for American Sign Language (ASL) gesture recognition from videos. Gestures are comprised of various upper body motions involving hand shapes, motion of both hands with facial expression and head movements. Previous approaches tackled this problem by directly learning 3D convolutional spatio-temporal models from video in a simplified settings with uniform backgrounds. To handle more complex variation in appearance and backgrounds we propose to exploit recent advances in estimation of 2D body pose using Deep Convolutional Neural Networks trained on large corpus of human pose annotations. We use the trajectories of 2D skeletal data estimated from video to train a baseline recursive neural network gesture recognition model. The basic model is further extended using embeddings of hand images obtained from another hand shape recognition model [15] with dynamics modeled by another recursive neural network. The final model learns how to fuse two Long Short Term Model (LSTM) recursive neural network models for skeletal and hand image data. We train and evaluate this model on the GMU-ASL51 dataset of 12 users and 51 ASL gestures [8] demonstrating its superior performance compared to several baseline models.

  • Conference Article
  • Cite Count Icon 8
  • 10.5244/c.29.104
Fast Online Upper Body Pose Estimation from Video
  • Jan 1, 2015
  • Ming-Ching Chang + 4 more

Estimation of human body poses from video is an important problem in computer vision with many applications. Most existing methods for video pose estimation are offline in nature, where all frames in the video are used in the process to estimate the body pose in each frame. In this work, we describe a fast online video upper body pose estimation method (CDBN-MODEC) that is based on a conditional dynamic Bayesian network model, which predicts upper body pose in a frame without using information from future frames. Our method combines fast single image based pose estimation methods with the temporal correlation of poses between frames. We collect a new high frame rate upper body pose dataset that better reflects practical scenarios calling for fast online video pose estimation. When evaluated on this dataset and the VideoPose2 benchmark dataset, CDBN-MODEC achieves improvements in both performance and running efficiency over several state-of-art online video pose estimation methods.

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/icps49255.2021.9468181
Estimating Human Pose with both Physical and Physiological Constraints
  • May 10, 2021
  • Lei Su + 2 more

With the constant improvement and perfection in theories of rehabilitation exercises, the industrial chain of rehabilitation medical big data, which combines computer vision, sensors, human-computer interaction, and other advanced technologies, is developing rapidly. As an important way to understand and analyze human behavior, human pose estimation plays an important role in helping to identify patients' rehabilitation state, and then design targeted individual rehabilitation plan, and improve the objectivity of rehabilitation plan design. Therefore, human pose recognition technology has become a research hotspot in the field of rehabilitation medicine. However, there are not many studies on human pose estimation methods under scene constraints. To solve this problem, a body pose estimation method based on physical constraint and physiological constraint modeling was proposed in this paper. Three main constraints were established to make full use of 3D scene information and medical information to improve the body pose estimation results. Finally, through experiments, it is proved that the method proposed in this paper can greatly improve the results of 3D human pose estimation and can better cope with joint abnormalities in the process of fitting and accelerate the speed of fitting.

  • Conference Article
  • 10.1109/cme55444.2022.10063314
Attention-Guided Video Inference for 3D Human posture estimation
  • Nov 4, 2022
  • Ke Zhang + 5 more

Body posture refers to the body shape of the human body during exercise, which reflects the external shape and mental state of the body, and is a description of physical health. If you want to use artificial intelligence to understand human posture. Human pose estimation based on a single image is the foundation. Estimating human pose from monocular videos is still a challenging task. It has three main challenges, including spatial feature representation, temporal information representation, and model computational complexity. To this end, we propose an improved Video Inference for Human Body Pose and Shape Estimation. It can perform feature extraction in the temporal dimension in image sequences. We also define a new temporal network architecture with a self-attention mechanism, and perform feature extraction on the temporal dimension in the original framework. In order to improve, extensive experiments prove that the method has a significant improvement in challenging 3D pose estimation datasets and can more effectively extract the features of human poses from videos.

  • Research Article
  • 10.1109/access.2020.3041926
Graph Convolutional Adversarial Network for Human Body Pose and Mesh Estimation
  • Jan 1, 2020
  • IEEE Access
  • Yuancheng Huang + 1 more

This paper studies reconstruction of human body shape and pose from a single-view image. While most of current work attempts to regress parameters of human body model such as Skinned Multi-Person Linear Model (SMPL) and Hand Model with Articulated and Non-rigid Deformations (MANO), these parametric approaches underperform compared to non-parametric approaches. Due to the lack of the spatial relationship in the input image, the parametric approaches are hardly used to reconstruct the human body precisely. Besides, the rotation parameter regression is a complex task in parametric approaches. Therefore, we introduce a novel graph convolutional neural network (Graph CNN)-based framework for estimating a non-parametric mesh model. Our key innovation is that the proposed model is trained in a generative adversarial manner. Firstly, Graph CNN utilizes mesh topology to capture integral information of the full 3D human shape and then generate a more smooth and high-quality human mesh model. Secondly, the discriminator in our network acts as a supervisor to specify whether a human shape and pose are real or not. The generator is encouraged to generate human body mesh that is close to the manifold of the real human mesh distribution. Extensive experimental results demonstrate the effectiveness of our proposed framework. In contrast to the state-of-the-art methods, our method can achieve better performance in human shape and pose estimation.

  • Book Chapter
  • Cite Count Icon 2
  • 10.1007/978-3-030-60633-6_34
3D Human Body Shape and Pose Estimation from Depth Image
  • Jan 1, 2020
  • Lei Liu + 2 more

This work addresses the problem of 3D human body shape and pose estimation from a single depth image. Most 3D human pose estimation methods based on deep learning utilize RGB images instead of depth images. Traditional optimization-based methods using depth images aim to establish point correspondences between the depth images and the template model. In this paper, we propose a novel method to estimate the 3D pose and shape of a human body from depth images. Specifically, based on the joints features and original depth features, we propose a spatial attention feature extractor to capture spatial local features of depth images and 3D joints by learning dynamic weights of the features. In addition, we generalize our method to real depth data through a weakly-supervised method. We conduct extensive experiments on SURREAL, Human3.6M, DFAUST, and real depth images of human bodies. The experimental results demonstrate that our 3D human pose estimation method can yield good performance.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant