<p>In this paper, we propose Virtual Avatar Interactive Live Streaming System (VAILSS), our approach to streaming 3D avatar performances in interactive storytelling applications using multimodal interactive techniques. This study focuses on developing a virtual live interaction system for VTubers in the context of language learning. A live virtual performance from our system consists of three components: 1) an Avatar generation framework, 2) an AI motion capture system, and 3) an interactive storytelling engine. The system integrates artificial intelligence and uses motion capture to identify facial expressions and movements, enabling virtual characters to provide live storytelling services. Additionally, the system allows for bi-directional interaction with players through tablet touch and voice. The system aims to promote multimodal learning channels for the masses. To gain deeper insights into the effectiveness of virtual role-playing in language learning, we organized a three-week workshop to investigate our system&rsquo;s impact on user experience. We extended invitations to 17 night-school freshmen from the NFU Department of Applied Foreign Languages, enrolled in an English class focused on tour guides, to participate in a virtual performance. The experiment results demonstrated that the VAILSS positively affected students&rsquo; learning outcomes, particularly in enhancing foreign language acquisition.</p> <p> </p>