A feature matching method based on the convolutional neural network (named FM-CNN), inspired from matched-field processing (MFP), is proposed to estimate source depth in shallow water. The FM-CNN, trained on the acoustic field replicas of a single source generated by an acoustic propagation model in a range-independent environment, is used to estimate single and multiple source depths in range-independent and mildly range-dependent environments. The performance of the FM-CNN is compared to the conventional MFP method. Sensitivity analysis for the two methods is performed to study the impact of different environmental mismatches (i.e., bottom parameters, water column sound speed profile, and topography) on depth estimation performance in the East China Sea environment. Simulation results demonstrate that the FM-CNN is more robust to the environmental mismatch in both single and multiple source depth estimation than the conventional MFP. The proposed FM-CNN is validated by real data collected from four tracks in the East China Sea experiment. Experimental results demonstrate that the FM-CNN is capable of reliably estimating single and multiple source depths in complex environments, while MFP has a large failure probability due to the presence of strong sidelobes and wide mainlobes.