Abstract Study question What are the best prediction models which can estimate the chance of natural conception in couples with unexplained infertility? Summary answer The best quality clinical prediction models were those generated by Hunault et al (2004), Van Eekelen et al (2017) and McLernon et al. (2019). What is known already Although couples with unexplained infertility have a chance of natural conception, a formal estimate of prognosis is not part of routine clinical decision making in most settings. In the UK, NICE recommends IVF after two years of unexplained infertility in all couples, regardless of their individual chances of conception without treatment. This approach risks overtreatment for some whilst delaying access to IVF for others. Since the last systematic review on this topic 10 years ago, several models which aim to predict natural conception in infertile couples have been published, but few are in routine clinical use Study design, size, duration We searched OVID MEDLINE, OVID EMBASE, and OVID PsycINFO systematically for primary articles published between 1978 and 2022, reporting on the development and/or validation of models in predicting spontaneous conception, pregnancy, or live birth. No language or any other restrictions were applied. Participants/materials, setting, methods We included couples with unexplained infertility / those with no major barriers to natural conception. Women who underwent any form of fertility treatment immediately after the initial diagnostic work up were excluded. The methodological quality of the included papers was assessed using criteria within the CHARMS checklist. Risk of bias was evaluated using the PROBAST tool while discrimination and calibration results, which rate the performance of the prediction model, were also reported. Main results and the role of chance Eighteen publications reported on 23 prediction models with natural conception, pregnancy, ongoing pregnancy and live birth as outcomes. Of 11 studies involving model development, internal and external validation were reported in 4 and 2 publications respectively . Two studies published extended versions of the same model (Hunault et al, 2004) while 3 studies focussed on independent validation of existing models. The methodological rigour of the models has improved over time, as demonstrated by accuracy measures including discrimination and calibration. Three models (Hunault et al, 2004, Van Eekelen et al, 2017, and McLernon et al, 2019) had low risk of bias. The static Hunault model can be used to determine the chance of conception at a single point in time – usually at the conclusion of the initial fertility work up, whereas the Van Eekelen and McLernon models are dynamic models that can estimate chances of conception at different time points. The discriminatory ability of models ranged between 0.59-0.64 in the internal validation and external validation studies. The calibration slope for the Hunault’s static model was 0.6 to 1.0 and for the dynamic models it ranged from 0.62 to 1.01 for Van Eekelen’s model and 0.65to 1.06 for McLernon’s model. Limitations, reasons for caution The quality of prediction models which predated the CHARMS checklist and PROBAST tool could not be adequately assessed. The population with most models went beyond unexplained infertility and included other couples with a chance of natural conception e.g. mild male infertility and minimal endometriosis. Wider implications of the findings Of the 3 models which are of high quality, the Hunault model can only be used once but the Van Eekelen and McLernon models have the potential to be used more flexibly, following further external validation in different settings and larger populations. Trial registration number not applicable