This paper investigates the use of deep learning architectures for recognizing 3D gestures, particularly focusing on air-writing, performed using hand or finger movements. Unlike traditional handwriting, which involves distinct pen-up and pen-down actions, air-writing lacks a defined sequence of such events. Additionally, the absence of visual or tactile feedback and the high dependency among target gestures pose significant challenges for accurate character recognition. From the user's perspective, air-writing can be executed in three distinct styles: connected, isolated, and overlapped. In the connected style, users write sequences of characters in free space, similar to writing on paper from left to right. The isolated approach confines each character’s gesture within an imaginary 3D box, making segmentation easier but less natural. The overlapped method is the most complex, as it involves stacking adjacent characters within the same imaginary space. Variations in character size and shape during connected writing can impact recognition accuracy, even for the same user, while the constrained isolated style offers more consistent segmentation. Advancements in Motion Capture (MoCap) technologies and increased computational capabilities have encouraged the exploration of air-writing in human-computer interaction (HCI) systems. Some systems utilize handheld tools or custom gloves for tracking motion trajectories, while low-cost depth sensors, such as Microsoft Kinect, Intel RealSense, and Leap Motion Controller (LMC), offer non-intrusive, real-time tracking solutions. Kinect focuses on long-range 3D posture detection, whereas LMC and RealSense provide millimeter-level precision for tracking hand and finger movements. However, distinguishing actual writing gestures from arbitrary hand movements in continuous motion streams remains a significant challenge, independent of character recognition. Existing air-writing systems primarily target letter and digit recognition due to the limited availability of word-level training datasets. To address this gap, we propose a solution capable of both detecting and recognizing air-written characters within continuous motion data. Our system, based on the LMC sensor, includes a web interface for capturing 1,200 air-written digits (0–9) while providing real-time visual feedback. An efficient fingertip tracking mechanism simulates pen-up and pen-down states. The air-writing recognition process is framed as a classification task, utilizing both dynamic 3D trajectories (time-series data) and static 2D stroke projections (images) to train deep learning models with convolutional and recurrent architectures. Experimental results demonstrate near-perfect recognition rates achieved within milliseconds, making the system suitable for real-time deployment. The paper is organized as follows: Section II reviews related works in air-writing. Section III outlines the methodology, including the LMC interface, dataset collection, and employed deep learning models. Section IV describes the experimental setup and evaluation results. Finally, Section V concludes with key findings and future research directions..
Read full abstract