Acoustic monitoring has recently shown great potential in the diagnosis of infrastructure condition. However, due to the severe noise interference in acoustic signals, meaningful features tend to be difficult to infer. It creates a considerable obstacle for an extensive application of acoustic monitoring. To tackle this problem, we propose an acceleration-guided acoustic signal denoising framework (AG-ASDF) based on learnable wavelet transform to automatically denoise the acoustic signal and extract the relevant features based on the acceleration signal. This denoising framework requires the acceleration signal only for the training stage. Therefore, only acoustic sensors (non-intrusive) need to be installed during the application phase, which is convenient and crucial for the condition monitoring of safety-critical infrastructure. A comparative study is conducted among the proposed AG-ASDF and other feature learning / extraction methods, by using a multi-class support vector machine to evaluate the detection effectiveness of slab track condition based on acoustic signals. Different healthy and unhealthy states of slab tracks are imitated with three types of slab track supporting conditions in a railway test line. The classification based on the proposed AG-ASDF features outperforms other feature extraction and learning methods with a significant accuracy improvement.