Abstract
In recent years, speech interfaces have been proposed for making shopping order or restaurant reservations while transmitting information only by speech. It is easy to imagine that such a system will be installed in cars soon. However, to use the speech interface that can realize such tasks while the driver is driving, the system needs to understand the driving situation of the driver and control the speech guide timing accordingly. We focused on the fact that there are scenes (SCD: Scenes of Concentrate Driving) where the driver temporarily interrupts the speech operation unconsciously to concentrate on driving operations. However, one does not yet know in which scenes the voice dialogue becomes a burden. In this study, we identified driving scenes where dialogue is a burden (SCD), clarified the relationship between SCD and driving behavior, and considered a method to estimate SCD automatically. In this research, we define SCD as a situation in which the driver temporarily suspends the speech operation and wants to perform only the driving operation, although the system can perform both the speech operation and the driving operation. Under SCD circumstances, even if the speech interface presents some guidance, the driver can not understand its contents. Also, even if the system prompts for an answer by speech input, the driver does not answer. To confirm the existence of SCD, we have created nine driving scenes that require different driving operations (lane change, overtaking, narrow road, right turn, etc.) and reproduced these scenes using a Driving Simulator (DS). We also created some tasks to prompt the driver's voice input, and present them just before the driving scene, such we could check whether the driver interrupts or not the dialogue during the driving operations needed to pass the scene. We recorded speech and collected driving signals (steering angle, throttle opening, etc.) for 15 men and women in their 20s (11 men and 4 women) with a driver's license. First, for each driving scene, we checked whether the driver could answer the immediately preceding question presented by an operator and confirms that SCD occurs in various driving scenes. In the driving scenes involving lane change, the utterance was interrupted for more than 40% of the tasks. On the other hand, in the driving scenes in which the driver steers left and right curves, while driving in the same lane as the preceding vehicle, the utterance was seldom interrupted. These results confirm the existence of driving scenes in which speech is likely to be interrupted, that is, SCD. Next, we considered a machine learning model to estimate the occurrence of SCD automatically, using only the driving signal. We compared the classification accuracy of SCD occurrence according to several analysis window lengths, from 0.5seconds to 5 seconds with a 0.5seconds increment. The results showed that a trained Support Vector Machine (SVM) could provide a classification accuracy of about to 85% for two classes (SCD or not), and 82% for three classes (SCD, semi-SCD, regular) with a window size of 2seconds.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have