The utilization of Channel State Information (CSI) in Wi-Fi-based passive sensing has become popular due to its cost-effectiveness and broad applicability. This technology is highly valued for its ability to gather data without requiring active user interaction, making it versatile in various contexts. However, existing methods face significant challenges, including low sensing accuracy, high processing requirements, and system instability. To address these issues, we introduce the WiSigPro Transformer, an advanced CSI-based Wi-Fi passive sensing model specifically designed for Human Activity Recognition. This model employs multi-head attention mechanisms and positional encoding to effectively capture complex spatiotemporal patterns, thereby enhancing both robustness and accuracy. Our approach integrates advanced signal processing techniques to improve signal quality and feature extraction. These techniques include wavelet denoising to reduce noise, median filtering to smooth the signal, and Power Spectral Density analysis using Welch’s method to capture frequency domain features. Additionally, we use normalization to standardize amplitude and phase data, and feature engineering methods to extract comprehensive signal characteristics, such as skewness and kurtosis. To address data imbalance, we apply the Synthetic Minority Over-sampling Technique and data augmentation strategies, ensuring balanced representation and improved model generalization. Through comprehensive simulations, the WiSigPro Transformer demonstrates superior performance across key metrics, including recognition accuracy, precision, recall, and F1-score. Achieving an impressive 98% accuracy, it outperforms conventional neural networks such as CNN, RNN, BiLSTM, LSTM, and ABLSTM. These results underscore the transformative potential of the WiSigPro Transformer in Wi-Fi-based passive sensing and activity recognition applications, making it a powerful tool for accurately capturing and analyzing spatiotemporal data.