Mel-Frequency Cepstral Coefficients (MFCC) and its differential coefficients are widely used as a typical non-linear spectral envelope feature in passive sonar target recognition. MFCC emphasizes the frequency domain information at the low-frequency end of the target, while the differential coefficients add dynamic characteristics of MFCC in the time domain. However, the utilization of time information is not sufficient. In response to this issue, we propose two solutions: one is the characteristic fusion method of Weighted Mel-Frequency Cepstral Coefficient based on Bhattacharyya Distance (BD-WMFCC); the other is the characteristic extraction method of VoicePrint Slice Statistics (VPSS) based on time–frequency joint processing. BD-WMFCC combines weighted differential coefficients with MFCC using Bhattacharyya distance, maximizing the utilization of time information. VPSS is a linear spectral feature that utilizes the stability of the spectral intensity within a certain bandwidth over continuous time. It extracts the steady-state characteristics of the target's spectrum with respect to time changes at equidistant bandwidths. Simulation and real data analysis have shown that VPSS demonstrates outstanding separability and classification performance in both non-linear and linear spectral features, with strong stability. In practical data analysis, the accuracy of VPSS features under the AlexNet algorithm is 3.8 % and 5.7 % higher than the spectral (Fre) and Mel features, respectively.