Utilizing hydrophone arrays for detecting underwater acoustic communication (UWAC) signals leverages spatial information to enhance detection efficiency and expand the perceptual range. This study redefines the task of UWAC signal detection as an object detection problem within the frequency–azimuth (FRAZ) spectrum. Employing Faster R-CNN as a signal detector, the proposed method facilitates the joint prediction of UWAC signals, including estimates of the number of sources, modulation type, frequency band, and direction of arrival (DOA). The proposed method extracts precise frequency and DOA features of the signals without requiring prior knowledge of the number of signals or frequency bands. Instead, it extracts these features jointly during training and applies them to perform joint predictions during testing. Numerical studies demonstrate that the proposed method consistently outperforms existing techniques across all signal-to-noise ratios (SNRs), particularly excelling in low SNRs. It achieves a detection F1 score of 0.96 at an SNR of −15 dB. We further verified its performance under varying modulation types, numbers of sources, grating lobe interference, strong signal interference, and array structure parameters. Furthermore, the practicality and robustness of our approach were evaluated in lake-based UWAC experiments, and the model trained solely on simulated signals performed competitively in the trials.