IntroductionThe baseline endemicity profile of lymphatic filariasis (LF) is a key benchmark for planning control programmes, monitoring their impact on transmission and assessing the feasibility of achieving elimination. Presented in this work is the modelled serological and parasitological prevalence of LF prior to the scale-up of mass drug administration (MDA) in Nigeria using a machine learning based approach.MethodsLF prevalence data generated by the Nigeria Lymphatic Filariasis Control Programme during country-wide mapping surveys conducted between 2000 and 2013 were used to build the models. The dataset comprised of 1103 community-level surveys based on the detection of filarial antigenemia using rapid immunochromatographic card tests (ICT) and 184 prevalence surveys testing for the presence of microfilaria (Mf) in blood. Using a suite of climate and environmental continuous gridded variables and compiled site-level prevalence data, a quantile regression forest (QRF) model was fitted for both antigenemia and microfilaraemia LF prevalence. Model predictions were projected across a continuous 5 × 5 km gridded map of Nigeria. The number of individuals potentially infected by LF prior to MDA interventions was subsequently estimated.ResultsMaps presented predict a heterogeneous distribution of LF antigenemia and microfilaraemia in Nigeria. The North-Central, North-West, and South-East regions displayed the highest predicted LF seroprevalence, whereas predicted Mf prevalence was highest in the southern regions. Overall, 8.7 million and 3.3 million infections were predicted for ICT and Mf, respectively.ConclusionsQRF is a machine learning-based algorithm capable of handling high-dimensional data and fitting complex relationships between response and predictor variables. Our models provide a benchmark through which the progress of ongoing LF control efforts can be monitored.