Social robots are intelligent programs that can autonomously run, automatically publish information, and interact on social media platforms. This article focuses on the speech recognition algorithm of interactive artificial intelligence systems, combined with the Hanning window function and the Vitit algorithm, and explores its application value in English video teaching systems through detailed experiments. Adopting speech feature extraction technology provides strong support for the subsequent recognition process. The application of Hanning window function has a positive impact on speech signal preprocessing. The most indispensable algorithm in speech recognition models is the Vitit algorithm, which successfully solves the problem of time variation and continuity of speech signals. This study successfully applied speech recognition algorithms based on interactive artificial intelligence systems to English video teaching systems. By combining speech feature extraction, Hanning window function, and Vitit algorithm, a system with strong recognition ability and interactivity was constructed. The English video teaching system aims to improve students’ learning process. By combining interactive artificial intelligence speech recognition technology with English video teaching, it can improve students’ experience in learning English and also meet their personalized learning habits.