Seabed inspection is one of the most sought-after applications for autonomous underwater vehicles (AUVs). Acoustical sensors, such as side-scan sonars and forward-looking sonars (FLSs), are commonly favored over optical cameras to carry out such a task. Indeed, sonars are not influenced by illumination conditions and can provide high-range data. However, due to the lack of features and low resolution, acoustical images are often hard to interpret with conventional automatic techniques, forcing human operators to analyze thousands of collected images to identify the so-called objects of potential interest (OPIs). In this article, we report the development of an automatic target recognition (ATR) methodology to identify and localize OPIs in FLS imagery. Such detections have been then exploited to realize a virtual world model with the probabilistic multiple hypothesis anchoring data association and model tracking algorithm. Distinct models of convolutional neural networks have been trained with a data set acquired in May 2019 at the Naval Support and Experimentation Centre (Centro di Supporto e Sperimentazione Navale—CSSN) basin in La Spezia, Italy. The ATR strategy has been successfully validated offline with the data gathered in October 2019 in the same site where the seabed targets were replaced and relocated. As regards the world modeling technique, it has been preliminarily tested on a simulated scenario built upon unmanned underwater vehicle Simulator. Finally, both the ATR and world modeling systems were on-field tested in October 2020 at the CSSN basin in a multivehicle architecture by employing an acoustical channel between FeelHippo AUV and an autonomous moving buoy.