Effects of Touch, Voice, and Multimodal Input on Multiple-UAV Monitoring During Simulated Manned-Unmanned Teaming in a Military Helicopter

Samuel J Levulis,So Young Kim,Patricia R Delucia

doi:10.1177/1541931213601030

Abstract

A key component of the U.S. Army’s vision for future unmanned aerial vehicle (UAV) operations is to integrate UAVs into manned missions, an effort called manned-unmanned teaming (MUM-T; Department of Defense, 2010). One candidate application of MUM-T is to provide the Air Mission Commander (AMC) of a team of Black Hawk helicopters control of multiple UAVs, offering advanced reconnaissance and real-time intelligence of the upcoming flight route and landing zones. One important design decision in the development of a system to support multi-UAV control by an AMC is the selection of the interface used to control the system, for example, through a touchscreen or voice commands. A variety of input methods is feasible from an engineering standpoint, but little is known about the effect of the input interface on AMC performance. The current study evaluated three interface input methods for a MUM-T supervisory control system used by an AMC located in a Black Hawk helicopter. The evaluation was conducted with simulation software developed by General Electric. Eighteen participants supervised a team of two helicopters and three UAVs as they traveled towards a landing zone to deploy ground troops. A primary monitor, located in front of the participant, presented displays used to monitor flight instruments and to supervise the manned and unmanned vehicles that were under the AMC’s control. A secondary monitor, located adjacent to the participant, presented displays used to inspect and classify aerial photographs taken by the UAVs. Participants were responsible for monitoring and responding to instrument warnings, classifying the aerial photographs as either neutral or hostile, and responding to radio communications. We manipulated interface input modality (touch, voice, multimodal) and workload (rate of photographs to classify). Participants completed three blocks of 8.5-minute experimental trials, one for each input modality. Results indicated that touch and multimodal input methods were superior to voice input. Participants were more efficient with touch and multimodal control (compared to voice), evidenced by relatively shorter photograph classification times, a greater percentage of classified photographs, and shorter instrument warning response times. Touch and multimodal input also resulted in a greater percentage of correct responses to communication task queries, lower subjective workload, greater subjective situation awareness, and higher usability ratings. Multimodal input did not result in significant performance advantages compared to touch alone. Designers should carefully consider the performance tradeoffs when selecting from candidate input methods during system development.

Full Text