As eyewear devices such as smart glasses become more common, it is important to provide input methods that can be used at all times for such situations and people. Silent speech interaction (SSI) has the potential to be useful as a hands-free input method for various situations and people, including those who have difficulty with voiced speech. However, previous methods have involved sensor devices that are difficult to use anytime and anywhere. We propose a method for SSI that involves using an eyewear device equipped with infrared distance sensors. The proposed method measures facial skin movements associated with speech from the infrared distance sensor mounted on an eyewear device and recognizes silent speech commands by applying machine learning to time series sensor data. The proposed method was applied to a prototype system including a sensor device consisting of eyewear and ear-mounted microphones to measure the movements of the cheek, jaw joint, and jaw. Evaluations 1 and 2 showed that five speech commands could be recognized with an F value of 0.90 and ten longer speech commands with an F value of 0.83. Evaluation 3 showed how the recognition accuracy changes with the combination of sensor points. Evaluation 4 examined whether the proposed method can be used for a larger number of speech commands with 21 commands by using deep learning LSTM and a combination of DTW and kNN. Evaluation 5 examined the recognition accuracy in some situations affecting recognition accuracy such as re-attaching devices and walking. These results show the feasibility of the proposed method for a simple hands-free input interface, such as with media players and voice assistants. Our study provides the first wearable sensing method that can easily apply SSI functions to eyewear devices.
Read full abstract