With the development of information technology, online vocal teaching is becoming more and more popular, but the sound quality of teaching is also becoming more and more demanding. As online vocal instruction becomes more popular, the need for high-quality sound in these digital environments becomes more critical. This research tackles the problem of improving sound quality in real-time vocal teaching by integrating advanced technologies such as Blockchain and Machine Learning within the Internet of Things (IoT) security framework. We created a vocal recognition model using Time-Delay Neural Network (TDNN) and improved it with Generated Feature Vector (GFV). This integration yields a strong GTDNN vocal recognition system that is specifically designed to secure and optimize web-based vocal teaching. Our experiments show that GTDNN outperforms traditional TDNN and i-vector methods in feature vector extraction, adapting well to different speech environments. In various speech settings, GTDNN's Error Rates (EERs) are impressively low at 11.3%, 12.0%, 4.9%, 6.2%, and 6.1%, indicating superior performance over comparison models. GTDNN has an EER of 9.6% for short-duration speech and 2.3% for long-duration speech. Furthermore, the GTDNN system achieves an overall pass rate of 94% for target speech and an impressive rejection rate for non-target speech, ensuring high accuracy in a variety of speech environments.