Abstract
People with complex communication needs can use a high-technology Augmentative and Alternative Communication (AAC) device to communicate with others. Currently, researchers and clinicians often use data logging from high-tech AAC devices to analyze AAC user performance. However, existing automated data logging systems cannot differentiate the authorship of the data log when more than one user accesses the device. This issue reduces the validity of the data logs and increases the difficulties of performance analysis. Therefore, this paper presents a solution using a deep neural network-based visual analysis approach to process videos to detect different AAC users in practice sessions. This approach has significant potential to improve the validity of data logs and ultimately to enhance AAC outcome measures.
Highlights
An estimated 3.7 million people in the United States have severe speech and language impairments due to various medical issues such as autism, cerebral palsy, aphasia, and amyotrophic lateral sclerosis
The producers of the log data can be identified when an alternative communication (AAC) practice video is processed by the speakeraware information logging (SAIL) system, where a customized deep neural network single shot detection [27] is used for this task
Our system can detect above 90% of different hands successfully across three different AAC software and four different users
Summary
An estimated 3.7 million people in the United States have severe speech and language impairments due to various medical issues such as autism, cerebral palsy, aphasia, and amyotrophic lateral sclerosis. By reading the ADL data, it is impossible to know if the three lines were generated by the AAC user only or other participants This limitation reduces the validity of data logs and the accuracy of performance analysis including semantic analysis, syntactical analysis, and usage efficiency [11], and further impedes the efficiency of AAC services. The current solution for this limitation is to have clinicians manually clean the ADLs (filtering or labeling) by using predetermined guidelines [7] or comparing the ADLs with the recorded videos from intervention sessions with human vision These methods are time consuming and labor intensive and error prone. The detected objects incorporating the motion patterns into low-level microaction descriptors, Bag-of-Micro-Actions, and Gaussian mixture models clustering and Fisher vector encoding are used to recognize the activities
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have