High-frequency acoustic cameras have provided high-resolution sonar images to enable the development of signal processing algorithms, such as 3-D reconstruction, target recognition, and precision navigation. However, the lack of large datasets has hampered the development of such algorithms. In this context, this paper proposes a time-domain Kirchhoff approximation extensions (TDKAX)-based model to generate realistic acoustic camera images, capable of accounting for multiple scattering from arbitrary-shapetargets. The images are formed by a coherent summation of different orders of transient echoes, along with a pipeline of the range conversion, time-varying gain compensation, noise addition, and image rendering. Validating the proposed model using tank experiments of isolated cylindrical concave reflectors, we then present the ray paths based on the coordinates of intersection points. They can help interpret the mechanism of image features in the absence of amplitude information. Results demonstrate that, with the temporal and spatial coherence of different orders of scattering, the proposed model successfully reproduces forward-looking sonar (FLS) images with multiple scattering arcs and coherent speckles.