This research aimed to develop a dataset of acoustic images recorded by a forward-looking sonar mounted on an underwater vehicle, enabling the classification of unexploded ordnances (UXOs) and objects other than unexploded ordnance (nonUXOs). The dataset was obtained using digital twin simulations performed in the Gazebo environment utilizing plugins developed within the DAVE project. It consists of 69,444 sample images of 512 × 399 resolution organized in two classes annotated as UXO and nonUXO. The obtained dataset was then evaluated by state-of-the-art image classification methods using off-the-shelf models and transfer learning techniques. The research included VGG16, ResNet34, ResNet50, ViT, RegNet, and Swin Transformer. Its goal was to define a base rate for the development of other specialized machine learning models. Neural network experiments comprised two stages-pre-training of only the final layers and pre-training of the entire network. The experiments revealed that to obtain high accuracy, it is required to pre-train the entire network, under which condition, all the models achieved comparable performance, reaching 98% balanced accuracy. Surprisingly, the highest accuracy was obtained by the VGG model.