Abstract. Biomass burning smoke is advected over the southeastern Atlantic Ocean between July and October of each year. This smoke plume overlies and mixes into a region of persistent low marine clouds. Model calculations of climate forcing by this plume vary significantly in both magnitude and sign. NASA EVS-2 (Earth Venture Suborbital-2) ORACLES (ObseRvations of Aerosols above CLouds and their intEractionS) had deployments for field campaigns off the west coast of Africa in 3 consecutive years (September 2016, August 2017, and October 2018) with the goal of better characterizing this plume as a function of the monthly evolution by measuring the parameters necessary to calculate the direct aerosol radiative effect. Here, this dataset and satellite retrievals of cloud properties are used to test the representation of the smoke plume and the underlying cloud layer in two regional models (WRF-CAM5 and CNRM-ALADIN) and two global models (GEOS and UM-UKCA). The focus is on the comparisons of those aerosol and cloud properties that are the primary determinants of the direct aerosol radiative effect and on the vertical distribution of the plume and its properties. The representativeness of the observations to monthly averages are tested for each field campaign, with the sampled mean aerosol light extinction generally found to be within 20 % of the monthly mean at plume altitudes. When compared to the observations, in all models, the simulated plume is too vertically diffuse and has smaller vertical gradients, and in two of the models (GEOS and UM-UKCA), the plume core is displaced lower than in the observations. Plume carbon monoxide, black carbon, and organic aerosol masses indicate underestimates in modeled plume concentrations, leading, in general, to underestimates in mid-visible aerosol extinction and optical depth. Biases in mid-visible single scatter albedo are both positive and negative across the models. Observed vertical gradients in single scatter albedo are not captured by the models, but the models do capture the coarse temporal evolution, correctly simulating higher values in October (2018) than in August (2017) and September (2016). Uncertainties in the measured absorption Ångstrom exponent were large but propagate into a negligible (<4 %) uncertainty in integrated solar absorption by the aerosol and, therefore, in the aerosol direct radiative effect. Model biases in cloud fraction, and, therefore, the scene albedo below the plume, vary significantly across the four models. The optical thickness of clouds is, on average, well simulated in the WRF-CAM5 and ALADIN models in the stratocumulus region and is underestimated in the GEOS model; UM-UKCA simulates cloud optical thickness that is significantly too high. Overall, the study demonstrates the utility of repeated, semi-random sampling across multiple years that can give insights into model biases and how these biases affect modeled climate forcing. The combined impact of these aerosol and cloud biases on the direct aerosol radiative effect (DARE) is estimated using a first-order approximation for a subset of five comparison grid boxes. A significant finding is that the observed grid box average aerosol and cloud properties yield a positive (warming) aerosol direct radiative effect for all five grid boxes, whereas DARE using the grid-box-averaged modeled properties ranges from much larger positive values to small, negative values. It is shown quantitatively how model biases can offset each other, so that model improvements that reduce biases in only one property (e.g., single scatter albedo but not cloud fraction) would lead to even greater biases in DARE. Across the models, biases in aerosol extinction and in cloud fraction and optical depth contribute the largest biases in DARE, with aerosol single scatter albedo also making a significant contribution.