The rapid growth of deep learning and the Internet of Things has spurred the need for touchless biometric systems in areas where cleanliness and non-intrusive user interaction are critical. In order to authenticate seamlessly, traditional biometric methods such as fingerprint, hand, etc. recognition require physical contact and, therefore, risk hygiene issues, hence making face and speaker verification more viable alternatives. A robust Multimodal Biometric Attendance System (MBAS) is needed due to the vulnerabilities and limitations of single modality systems. In this research, we introduce MBAS using feature-level fusion of speech data with face data, combining the best of both worlds. The textural features based on a person’s facial appearance are integrated with dynamic speech information for liveness detection, followed by dimensionality reduction using linear discriminant analysis, and then incorporated into a Bi-LSTM classifier. Therefore, for better security, accuracy, and anti-spoofing attacks, this approach is proposed in addition to increasing accuracy as well as enhancing security against spoofing attacks. Two publicly available datasets, DeepfakeTIMIT and AVSpeech, are extensively explored to evaluate different fusion strategies, classifier types, and standard performance metrics. The proposed system outperformed other cutting-edge biometric based systems by exhibiting a 97.51% high accuracy rate with a precision of 99.10% and an equal error rate of 2.48%. These findings affirm the effectiveness and possible real-world applications of the MBAS concept, along with its enhancement ensuring safety. Furthermore, this study underscores the importance of incorporating advanced liveness detection into secure contactless biometrics solutions for modern attendance management in various industries that encompass both face and voice modalities.