AbstractThe noise within train is a paradox; while harmful to passenger health, it is useful to operators as it provides insights into the working status of vehicles and tracks. Recently, methods for identifying defects based on interior noise signals are emerging, among which representation learning is the foundation for deep neural network models to understand the key information and structure of the data. To provide foundational data for track fault detection, a representation learning framework for interior noise, named the interior noise representation framework, is introduced. The method includes: (i) using wavelet transform to represent the original noise signal and designing a soft and hard denoising module for dataset denoising; (ii) deep residual convolutional denoising variational autoencoder (VAE) module performs representation learning with a VAE and deep residual convolutional neural networks, enabling richer data augmentation for sparsely labeled samples by manipulating the embedding space; (iii) deep embedding clustering submodule balances the representation of reconstruction and clustering features through the joint optimization of these aspects, categorizing metro noise into three distinct classes and effectively discriminating significantly different features. The experimental results show that, compared to traditional mechanism‐based models for characterizing interior noise, this approach offers a data‐driven general analysis framework, providing a foundational model for downstream tasks.