Recent years have seen deep learning (DL) architectures being leveraged for learning the nonlinear relationships across the parameters in seismic inversion problems in order to better analyse the subsurface, such as improved velocity model building (VMB). In this study, we focus on deep-learning-based inversion (DLI) for velocity model building, leveraging on a conditional generative adversarial network (PIX2PIX) with ResNet-9 as generator, as well as a comprehensive mathematical methodology for generating samples of multi-stratified heterogeneous velocity models for training the DLI architecture. We demonstrate that the proposed architecture can achieve state-of-the-art performance in reconstructing velocity models using only one seismic shot, thus reducing cost and computational complexity. We also demonstrate that the proposed solution is generalisable across linear multi-layer models, curved or folded structures, structures with salt bodies as well as higher-resolution structures built from geological images through quantitative and qualitative evaluation.