Various satellite sensors have provided a huge amount of observations of Earth's environment at variable spatial and temporal resolutions. Many global coarse-resolution land products have been generated from single-satellite data, but global temporally regular land products at fine spatial resolutions (say 10-30 m) are scarce because of infrequent observations. An ideal inversion framework can estimate multiple global continuous land variables at different spatial resolutions by combining all sources of satellite data. This paper proposes a new framework that can estimate five land variables simultaneously from the top-of-atmosphere (TOA) reflectance acquired by seven satellite sensors based on a multi-scale and multi-depth convolutional neural network (MSDCNN). This framework enables us to estimate temporally regular land variables at as fine as 10 m spatial resolution by transforming information from satellite data at coarser spatial resolutions. The estimated land variables include Leaf Area Index (LAI), Fraction of Absorbed Photosynthetically Active Radiation (FAPAR), shortwave albedo, visible albedo, and spectral reflectance. Seven sensors that acquire data at different spatial resolutions, include Visible Infrared Imaging Radiometer Suite (VIIRS) (750 m), Moderate Resolution Imaging Spectroradiometer (MODIS) (500 m), Fengyun-3 (FY-3) Medium Resolution Spectral Imager (MERSI) (250 m), China-Brazil Earth Resources Satellite 04 (CBERS-04) Wide Field Imager (WFI) (73 m), Landsat 8 Operational Land Imager (OLI) (30 m), Gaofen-1 (GF-1) Wide-Field-of-View (16 m) and Sentinel 2 A/B Multispectral Imager (MSI) (10 m). This framework is mainly composed of four steps. First, a Shuffled Complex Evolution (SCE) optimization method is adopted to estimate these variables from VIIRS TOA reference. Second, a joint-output random forest regression (RF) method is used to link the satellite observations and the estimated values from step 1. Third, the six-type multi-resolution satellite observations are used to downscale the VIIRS TOA reflectance to six different spatial resolutions by using the MSDCNN. Finally, the downscaled six-resolution VIIRS TOA reflectance is fed into the multiple-variable RF model to estimate land variables. The framework was validated, and the results based on high-resolution reference maps from ImagineS network and time series shortwave albedo field values from Surface Radiation (SURFRAD) and Integrated Carbon Observation System (ICOS) network show that the retrieved variables had high validation accuracy, with root mean square error (RMSE) ranges of 0.361–0.489 (LAI), 0.023–0.120 (FAPAR), 0.013–0.026 (snow-free shortwave albedo), respectively. Comparison results of the retrieved multi-scale variables with the Sentinel 2 A/B (10 m), Landsat 8 (30 m), Global LAnd Surface Satellite (GLASS, 250 m, 500 m), MODIS (500 m), and VIIRS (750 m) products show that their values were close, with RMSE ranges of 0.107–0.273 (LAI), 0.015–0.027 (FAPAR), 0.003–0.007 (shortwave albedo), 0.001–0.007 (visible albedo), and 0.003–0.025 (spectral reflectance), respectively. The results of the direct validation as well as the product intercomparison show that this novel framework has the potential to be used in estimating global land variables at various spatial scales using a variety of satellite data sources.