This paper presents LEVIOSA, a novel framework for text- and speech-based uncrewed aerial vehicle (UAV) trajectory generation. By leveraging multimodal large language models (LLMs) to interpret natural language commands, the system converts text and audio inputs into executable flight paths for UAV swarms. The approach aims to simplify the complex task of multi-UAV trajectory generation, which has significant applications in fields such as search and rescue, agriculture, infrastructure inspection, and entertainment. The framework involves two key innovations: a multi-critic consensus mechanism to evaluate trajectory quality and a hierarchical prompt structuring for improved task execution. The innovations ensure fidelity to user goals. The framework integrates several multimodal LLMs for high-level planning, converting natural language inputs into 3D waypoints that guide UAV movements and per-UAV low-level controllers to control each UAV in executing its assigned 3D waypoint path based on the high-level plan. The methodology was tested on various trajectory types with promising accuracy, synchronization, and collision avoidance results. The findings pave the way for more intuitive human–robot interactions and advanced multi-UAV coordination.
Read full abstract