ABSTRACT Analysing next-generation cosmological data requires balancing accurate modelling of non-linear gravitational structure formation and computational demands. We propose a solution by introducing a machine learning-based field-level emulator, within the Hamiltonian Monte Carlo-based Bayesian Origin Reconstruction from Galaxies (BORG) inference algorithm. Built on a V-net neural network architecture, the emulator enhances the predictions by first-order Lagrangian perturbation theory to be accurately aligned with full N-body simulations while significantly reducing evaluation time. We test its incorporation in BORG for sampling cosmic initial conditions using mock data based on non-linear large-scale structures from N-body simulations and Gaussian noise. The method efficiently and accurately explores the high-dimensional parameter space of initial conditions, fully extracting the cross-correlation information of the data field binned at a resolution of $1.95\,h^{-1}$ Mpc. Percent-level agreement with the ground truth in the power spectrum and bispectrum is achieved up to the Nyquist frequency $k_\mathrm{N} \approx 2.79h \,\, \mathrm{Mpc}^{-1}$. Posterior resimulations – using the inferred initial conditions for N-body simulations – show that the recovery of information in the initial conditions is sufficient to accurately reproduce halo properties. In particular, we show highly accurate $M_{200\mathrm{c}}$ halo mass function and stacked density profiles of haloes in different mass bins $[0.853,16]\times 10^{14}\,{\rm M}_{\odot }\,h^{-1}$. As all available cross-correlation information is extracted, we acknowledge that limitations in recovering the initial conditions stem from the noise level and data grid resolution. This is promising as it underscores the significance of accurate non-linear modelling, indicating the potential for extracting additional information at smaller scales.