Ultrasound imaging is widely used in medicine for its safety and affordability. However, it demands large transducer arrays because the image resolution is proportional to the number of elements, N, typically around 128 for 2D imaging. Three-dimensional imaging requires N2 audio channels (≈350 GB/s of data) if the equivalent 2D matrix array is used, which is practically impossible to process. To solve this issue, row-column arrays (RCAs) aggregate rows and columns of elements, reducing data rate and processing demands by a factor of N. A novel dual-stage beamforming algorithm further lowers the beamforming operations by N/2, with negligible impact on the image quality. For N = 128, the processing is 8192 times faster than with a matrix array, and it is hypothesized the 3D RCA beamforming can be done in real-time using a commodity graphics card. The beamforming rate of an NVIDIA RTX 4090 GPU was measured for in-vivo data from a rat kidney, achieving 1394 full volumes per second, which is over 150 times faster than previous implementations. Combining RCAs with the new beamforming algorithm and GPU processing thus enables volumetric beamforming to be done affordably at the bedside in real-time using a standard scanner and PC.