High-power ultrasonic horns operating at low frequency are known to generate a cone-shaped cavitation bubble cloud beneath them. The exact physical processes resulting in the conical structure are still unclear mainly due to challenges associated with their visualization. Herein, we address the onset of the cavitation cloud by exploiting high-speed X-ray phase contrast imaging. It reveals that the cone formation is not immediate but results from a three-step phenomenology: (i) inception and oscillation of single bubbles, (ii) individual cloud formation under splitting or lens effects, and (iii) cloud merging leading to the formation of a bubble layer and, eventually, to the cone structure due to the radial pressure gradient on the horn tip.