ABSTRACT In the past, diagnoses of the internal defects of ageing concrete members by sound tests have relied on the ability and experience of the inspector. However, with the development of deep neural networks, building diagnosis tools without the subjective judgements of humans is feasible. In this study, sound samples are collected from a tapping sound test on concrete slab specimens with internally embedded artificial defects. Spectrograms are generated from the sound samples and used to train a light-weighted convolutional neural network (CNN). Thereafter, the performance of the proposed CNN models at diagnosing the defects of the concrete slabs are examined and compared with convolutional autoencoder (CAE) and support vector regression (SVR). All the models performed well at distinguishing between healthy and defective concrete. Whereas noise can impair the performance of CAE and SVR, CNN is less affected, particularly when classifying sound samples with distinctive features. CNN also has other advantages: the physical area of the artificial defect can be output directly from CNN, and CNN performance is also less affected by the depth of the artificial defect and the rebar embedded in the concrete. At increased depths, the CNN is still able to identify the defect area.