In practical applications of passive sonar principles for extracting characteristic frequencies of acoustic signals, scientists typically employ traditional time-frequency domain transformation methods such as Mel-frequency, Short time Fourier transform (STFT), and Wavelet transform (WT). However, these solutions still face limitations in resolution and information loss when transforming data collected over extended periods. In this paper, we present a study using a two-stage approach that combines pre-processing by Cubic-splines interpolation (CSI) with a probability distribution in the hidden space with Siamese triple loss network model for classifying marine mammal (MM) communication signals. The Cubic-splines interpolation technique is tested with the STFT transformation to generate STFT-CSI spectrograms, which enforce stronger relationships between characteristic frequencies, enhancing the connectivity of spectrograms and highlighting frequency-based features. Additionally, stacking spectrograms generated by three consecutive methods, Mel, STFT-CSI, and Wavelet, into a feature spectrogram optimizes the advantages of each method across different frequency bands, resulting in a more effective classification process. The proposed solution using an Siamese Neural Network-Variational Auto Encoder (SNN-VAE) model also overcomes the drawbacks of the Auto-Encoder (AE) structure, including loss of discontinuity and loss of completeness during decoding. The classification accuracy of marine mammal signals using the SNN-VAE model increases by 11% and 20% compared to using the AE model (2013), and by 6% compared to using the Resnet model (2022) on the same actual dataset NOAA from the National Oceanic and Atmospheric Administration - United State of America.