In this paper, we establish a workflow for estimation of built-up density and height based on multispectral Sentinel-2 data. To do so, we render the estimation of built-up density and height as a supervised learning problem. Given the rational level of measurement of those two target variables, the regression estimation problem is regarded as finding the mapping between an incoming vector, i.e., ubiquitously available features computed from Sentinel-2 data, and an observable output (i.e., training set), which is derived over spatially limited areas in an automated manner. As such, training sets are automatically generated from a joint exploitation of TanDEM-X mission elevation data and Sentinel-2 imagery, and, as an alternative, from cadastral sources. The training sets are used to regress the target variables for spatial processing units which correspond to urban neighborhood scales. From a methodological point of view, we introduce a novel ensemble regression approach, i.e., multistrategy ensemble regression (MSER), based on advanced machine learning-based regression algorithms including Random Forest Regression, Support Vector Regression, Gaussian Process Regression, and Neural Network Regression. To establish a robust ensemble, those algorithms are learned with a modified version of the AdaBoost.RT algorithm. However, to reliably ensure diversity between single boosted regressors, we include a random feature subspace method in the procedure. In contrast to existing approaches, we selectively prune non-favorable regressors trained during the boosting procedure and calculate the final prediction by a weighted mean function on the residual models to ensure enhanced accuracy properties of predictions. Finally, outputs are concatenated into a single prediction with a decision fusion strategy. Experimental results are obtained from four test areas which cover the settlement areas of the four largest German cites, i.e., Berlin, Hamburg, Munich, and Cologne. The results unambiguously underline the beneficial properties of the MSER approach, since all best predictions were obtained with a boosted regressor in conjunction with a decision fusion strategy in a comparative setup. The mean absolute errors of corresponding models vary between 3 and 16% and 1–5.4 m with respect to built-up density and height, respectively, depending on the validation strategy, size of the spatial processing units, and test area. Also in a domain adaptation setup (i.e., when learning a model over a source domain and applying it over a geographically different target domain) numerous predictions show comparable accuracy levels as predictions obtained within a source domain. This further underlines the viability to transfer a model and, thus, enable a substitution of the training data in the target domains.