The Police Robot for Inspection and Mapping of Underwater Evidence (PRIME) is an uncrewed surface vehicle (USV) currently being developed for underwater search and recovery teams to assist in crime scene investigation. The USV maps underwater scenes using sidescan sonar (SSS). Test exercises use a clothed mannequin lying on the seafloor as a target object to evaluate system performance. A robust, automated method for detecting human body-shaped objects is required to maximise operational functionality. The use of a convolutional neural network (CNN) for automatic target recognition (ATR) is proposed. SSS image data acquired from four different locations during previous missions were used to build a dataset consisting of two classes, i.e., a binary classification problem. The target object class consisted of 166 196 × 196 pixel image snippets of the underwater mannequin, whereas the non-target class consisted of 13,054 examples. Due to the large class imbalance in the dataset, CNN models were trained with six different imbalance ratios. Two different pre-trained models (ResNet-50 and Xception) were compared, and trained via transfer learning. This paper presents results from the CNNs and details the training methods used. Larger datasets are shown to improve CNN performance despite class imbalance, achieving average F1 scores of 97% in image classification. Average F1 scores for target vs background classification with unseen data are only 47% but the end result is enhanced by combining multiple weak classification results in an ensemble average. The combined output, represented as a georeferenced heatmap, accurately indicates the target object location with a high detection confidence and one false positive of low confidence. The CNN approach shows improved object detection performance when compared to the currently used ATR method.