Abstract Marine diesel engines work in an environment with multiple excitation sources. Effective feature extraction and fault diagnosis of diesel engine vibration signals have become a hot research topic. Time-domain synchronous averaging (TSA) can effectively handle vibration signals. However, the key phase signal required for TSA is difficult to obtain. During signal processing, it can result in the loss of information on fault features. In addition, frequency multiplication signal waveforms are mixed. To address this problem, a multi-scale time-domain averaging decomposition (MTAD) method is proposed and combined with signal-to-image conversion and a convolutional neural network (CNN), to perform fault diagnosis on a marine diesel engine. Firstly, the vibration signals are decomposed by MTAD. The MTAD method does not require the acquisition of the key phase signal and can effectively overcome signal aliasing. Secondly, the decomposed signal components are converted into 2-D images by signal-to-image conversion. Finally, the 2-D images are input into the CNN for adaptive feature extraction and fault diagnosis. Through experiments, it is verified that the proposed method has certain noise immunity and superiority in marine diesel engine fault diagnosis.