Current approaches for evaluating noise-induced hearing loss (NIHL), such as the International Standards Organization 1999 (ISO) 1999 prediction model, rely mainly on noise energy and exposure time, thus ignoring the intricate time-frequency characteristics of noise, which also play an important role in NIHL evaluation. In this study, an innovative NIHL prediction model based on temporal and spectral feature extraction using an asymmetric convolution algorithm is proposed. Personal data and individual occupational noise records from 2214 workers across 23 factories in Zhejiang Province, China, were used in this study. In addition to traditional metrics like noise energy and exposure duration, the importance of time-frequency features in NIHL assessment was also emphasized. To capture these features, operations such as random sampling, windowing, short-time Fourier transform, and splicing were performed to create time-frequency spectrograms from noise recordings. Two asymmetric convolution kernels then were used to extract these critical features. These features, combined with personal information (e.g., age, length of service) in various configurations, were used as model inputs. The optimal network structure was selected based on the area under the curve (AUC) from 10-fold cross-validation, alongside the Wilcoxon signed ranks test. The proposed model was compared with the support vector machine (SVM) and ISO 1999 models, and the superiority of the new approach was verified by ablation experiments. The proposed model had an AUC of 0.7768 ± 0.0223 (mean ± SD), outperforming both the SVM model (AUC: 0.7504 ± 0.0273) and the ISO 1999 model (AUC: 0.5094 ± 0.0071). Wilcoxon signed ranks tests confirmed the significant improvement of the proposed model ( p = 0.0025 compared with ISO 1999, and p = 0.00142 compared with SVM). This study introduced a new NIHL prediction method that provides deeper insights into industrial noise exposure data. The results demonstrated the superior performance of the new model over ISO 1999 and SVM models. By combining time-frequency features and personal information, the proposed approach bridged the gap between conventional noise assessment and machine learning-based methods, effectively improving the ability to protect workers' hearing.