Ultrasound-modulated optical tomography (UOT) leverages the strengths of light and sound, enabling deep-tissue imaging with optical absorption contrast and acoustic resolution. Camera-based UOT, with its parallel detection capability, excels at handling weak light–sound interactions. However, the limited frame rate of the camera typically results in poor axial resolution and poses challenges for holographic measurements. In this study, we introduced intersected ultrasonic modulation to address these limitations, thereby achieving equal resolution in both lateral and axial dimensions through referenceless measurements. As a proof of concept, we constructed an imaging system and demonstrated two-dimensional imaging for absorptive targets buried inside a scattering medium. This approach opens avenues for improved imaging resolutions, showcasing the potential for future diagnostic endeavors.