Principal component transport-based data-driven reduced-order models (PC-transport ROM) are being increasingly adopted as a combustion model of turbulent reactive flows to mitigate the computational cost associated with incorporating detailed chemical kinetics. Previous studies were mainly limited to replicating relatively-simple chemistry in canonical configurations. The objective of the present study, therefore, is to further explore the accuracy of PC-transport ROM on more complex combustion phenomenon where, for example, large hydrocarbon fuel chemistry spanning a broad range of thermochemical space governs sequential multi-stage compression ignition processes. The cumulative error of PC-transport for this problem, and for others that depend upon sequential highly nonlinear physics, has to be minimal as the combustion phasing and heat release rate in internal combustion engines depends upon accurate predictions of minor ignition species whose concentrations start from ashes and grow orders of magnitude over the course of low- and high-temperature autoignition. Specifically, the PC-transport ROM is applied to predict the compression ignition characteristics of lean n-heptane/air and primary reference fuel (PRF)/air mixtures in a two-dimensional (2-D) constant volume computational domain initialized with a two-dimensional isotropic turbulence spectrum and temperature inhomogeneities. PCA is used to define the low-dimensional manifold that represents the original thermochemical state vector, and artificial neural network (ANN) models are adopted to tabulate chemical kinetics, transport, and thermodynamic properties. A series of 2-D pseudo-turbulent simulations are performed at engine pressures by varying the initial mean and r.m.s. of temperature, turbulence intensity, and the composition of fuel/air mixture. The results show that the PC-transport ROM accurately reproduces the instantaneous and statistical ignition characteristics of the fuel/air mixture, aided by pre-processing techniques including species subsetting, data clustering, and data transformation. It is found that PCs are not properly scaled with a power transformer if reactants are included in the species subset, which leads to a decrease in the accuracy of the PC-transport ROM. A separation of the reactants from the species subset ensures that the temporal evolution of the PCs starts from zero and spans orders of magnitude with time, and as such, this approach is found to effectively redistribute both PCs and their source terms with a power transformer. The computational speed-up factor of the PC-transport ROM ranges between 5.1 and 15.0 for the cases with n-heptane/air mixture and PRF/air mixture, respectively. Moreover, a potential further speed-up is anticipated through a combination of reduction in grid resolution requirements and in the stiffness of the chemical system. As an example, many of the pre-processing methods for inhomogeneous compression ignition may also apply to other complex intermittent combustion phenomena.Novelty and significance statement• The PCA-based reduced-order model (PC-transport ROM) has been applied to the multi-stage compression ignition of large hydrocarbon fuels under HCCI-relevant conditions. The present work presents a systematic procedure to accurately capture the two-stage ignition behavior of lean n-heptane/air or PRF50/air mixture.• The present work demonstrates an advantage of the PC-transport ROM in terms of computational speed-up. The computational speed-up factor for the ROM is up to 15, and moreover, a potential additional speed-up is anticipated through the reduction in the spatial and temporal resolution required.• A series of 2-D PC-transport ROMs are conducted to demonstrate the robustness of the ROM. A limitation of the ROM against different operating conditions is also discussed.