Seismic time-frequency (TF) transforms are essential tools in reservoir interpretation and signal processing, particularly for characterizing frequency variations in non-stationary seismic data. Recently, sparse TF transforms, which leverage sparse coding (SC), have gained significant attention in the geosciences due to their ability to achieve high TF resolution. However, the iterative approaches typically employed in sparse TF transforms are computationally intensive, making them impractical for real seismic data analysis. To address this issue, we propose an interpretable convolutional sparse coding (CSC) network to achieve high TF resolution. The proposed model is generated based on the traditional short-time Fourier transform (STFT) transform and a modified UNet, named ULISTANet. In this design, we replace the conventional convolutional layers of the UNet with learnable iterative shrinkage thresholding algorithm (LISTA) blocks, a specialized form of CSC. The LISTA block, which evolves from the traditional iterative shrinkage thresholding algorithm (ISTA), is optimized for extracting sparse features more effectively. Furthermore, we create a synthetic dataset featuring complex frequency-modulated signals to train ULISTANet. Finally, the proposed method's performance is subsequently validated using both synthetic and field data, demonstrating its potential for enhanced seismic data analysis.