Abstract

People with complex communication needs can use a high-technology Augmentative and Alternative Communication (AAC) device to communicate with others. Currently, researchers and clinicians often use data logging from high-tech AAC devices to analyze AAC user performance. However, existing automated data logging systems cannot differentiate the authorship of the data log when more than one user accesses the device. This issue reduces the validity of the data logs and increases the difficulties of performance analysis. Therefore, this paper presents a solution using a deep neural network-based visual analysis approach to process videos to detect different AAC users in practice sessions. This approach has significant potential to improve the validity of data logs and ultimately to enhance AAC outcome measures.

Highlights

  • An estimated 3.7 million people in the United States have severe speech and language impairments due to various medical issues such as autism, cerebral palsy, aphasia, and amyotrophic lateral sclerosis

  • The producers of the log data can be identified when an alternative communication (AAC) practice video is processed by the speakeraware information logging (SAIL) system, where a customized deep neural network single shot detection [27] is used for this task

  • Our system can detect above 90% of different hands successfully across three different AAC software and four different users

Read more

Summary

INTRODUCTION

An estimated 3.7 million people in the United States have severe speech and language impairments due to various medical issues such as autism, cerebral palsy, aphasia, and amyotrophic lateral sclerosis. By reading the ADL data, it is impossible to know if the three lines were generated by the AAC user only or other participants This limitation reduces the validity of data logs and the accuracy of performance analysis including semantic analysis, syntactical analysis, and usage efficiency [11], and further impedes the efficiency of AAC services. The current solution for this limitation is to have clinicians manually clean the ADLs (filtering or labeling) by using predetermined guidelines [7] or comparing the ADLs with the recorded videos from intervention sessions with human vision These methods are time consuming and labor intensive and error prone. The detected objects incorporating the motion patterns into low-level microaction descriptors, Bag-of-Micro-Actions, and Gaussian mixture models clustering and Fisher vector encoding are used to recognize the activities

METHODOLOGY
RELATED WORK
TRAINING
DATA AUGMENTATION
AAC EGOHAND DATASET
EXPERIMENTAL SETTING AND NETWORK TRAINING
NON-MAXIMUM SUPPRESSION
PERFORMANCE ASSESSMENT
CONCLUSION AND FUTURE WORK
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call