Historically, observing snow depth over large areas has been difficult. When snow depth observations are sparse, regression models can be used to infer the snow depth over a given area. Data sparsity has also left many important questions about such inference unexamined. Improved inference, or estimation, of snow depth and its spatial distribution from a given set of observations can benefit a wide range of applications from water resource management, to ecological studies, to validation of satellite estimates of snow pack. The development of Light Detection and Ranging (LiDAR) technology has provided non-sparse snow depth measurements, which we use in this study, to address fundamental questions about snow depth inference using both sparse and non-sparse observations. For example, when are more data needed and when are data redundant? Results apply to both traditional and manual snow depth measurements and to LiDAR observations. Through sampling experiments on high-resolution LiDAR snow depth observations at six separate 1.17-km2 sites in the Colorado Rocky Mountains, we provide novel perspectives on a variety of issues affecting the regression estimation of snow depth from sparse observations. We measure the effects of observation count, random selection of observations, quality of predictor variables, and cross-validation procedures using three skill metrics: percent error in total snow volume, root mean squared error (RMSE), and R2. Extremes of predictor quality are used to understand the range of its effect; how do predictors downloaded from internet perform against more accurate predictors measured by LiDAR? Whereas cross validation remains the only option for validating inference from sparse observations, in our experiments, the full set of LiDAR-measured snow depths can be considered the ‘true’ spatial distribution and used to understand cross-validation bias at the spatial scale of inference. We model at the 30-m resolution of readily available predictors, which is a popular spatial resolution in the literature. Three regression models are also compared, and we briefly examine how sampling design affects model skill. Results quantify the primary dependence of each skill metric on observation count that ranges over three orders of magnitude, doubling at each step from 25 up to 3200. Whereas uncertainty (resulting from random selection of observations) in percent error of true total snow volume is typically well constrained by 100–200 observations, there is considerable uncertainty in the inferred spatial distribution (R2) even at medium observation counts (200–800). We show that percent error in total snow volume is not sensitive to predictor quality, although RMSE and R2 (measures of spatial distribution) often depend critically on it. Inaccuracies of downloaded predictors (most often the vegetation predictors) can easily require a quadrupling of observation count to match RMSE and R2 scores obtained by LiDAR-measured predictors. Under cross validation, the RMSE and R2 skill measures are consistently biased towards poorer results than their true validations. This is primarily a result of greater variance at the spatial scales of point observations used for cross validation than at the 30-m resolution of the model. The magnitude of this bias depends on individual site characteristics, observation count (for our experimental design), and sampling design. Sampling designs that maximize independent information maximize cross-validation bias but also maximize true R2. The bagging tree model is found to generally outperform the other regression models in the study on several criteria. Finally, we discuss and recommend use of LiDAR in conjunction with regression modelling to advance understanding of snow depth spatial distribution at spatial scales of thousands of square kilometres. Copyright © 2012 John Wiley & Sons, Ltd.