PUBLICATIONS Water Resources Research RESEARCH ARTICLE 10.1002/2016WR018896 Key Points: A machine-learning algorithm for optimizing snow sensor placements is compared to expert placements in an existing sensor network The spatial and temporal transferability of the algorithm is then assessed in 14 total LiDAR surveys in two uninstrumented catchments The accuracy of the snow depth estimated from the sensor measurements is higher than expert and randomized placements in each of the surveys Correspondence to: C. A. Oroza, coroza@berkeley.edu Citation: Oroza, C. A., Z. Zheng, S. D. Glaser, and D. Tuia (2016), Optimizing embedded sensor network design for catchment- scale snow-depth estimation using LiDAR and machine learning, Water Resour. Res., 52, 8174–8189, doi:10.1002/2016WR018896. Received 11 MAR 2016 Accepted 23 SEP 2016 Accepted article online 27 SEP 2016 Published online 22 OCT 2016 Optimizing embedded sensor network design for catchment-scale snow-depth estimation using LiDAR and machine learning Carlos A. Oroza 1 , Zeshi Zheng 1 , Steven D. Glaser 1 , Devis Tuia 2 , and Roger C. Bales 1,3 Department of Civil and Environmental Engineering, University of California, Berkeley, California, USA, 2 Department of Geography, University of Zurich, Zurich, Switzerland, 3 Sierra Nevada Research Institute, University of California, Merced, California, USA Abstract We evaluate the accuracy of a machine-learning algorithm that uses LiDAR data to optimize ground-based sensor placements for catchment-scale snow measurements. Sampling locations that best represent catchment physiographic variables are identified with the Expectation Maximization algorithm for a Gaussian mixture model. A Gaussian process is then used to model the snow depth in a 1 km 2 area sur- rounding the network, and additional sensors are placed to minimize the model uncertainty. The aim of the study is to determine the distribution of sensors that minimizes the bias and RMSE of the model. We com- pare the accuracy of the snow-depth model using the proposed placements to an existing sensor network at the Southern Sierra Critical Zone Observatory. Each model is validated with a 1 m 2 LiDAR-derived snow-depth raster from 14 March 2010. The proposed algorithm exhibits higher accuracy with fewer sensors (8 sensors, RMSE 38.3 cm, bias 5 3.49 cm) than the existing network (23 sensors, RMSE 53.0 cm, bias 5 15.5 cm) and randomized placements (8 sensors, RMSE 63.7 cm, bias 5 24.7 cm). We then evaluate the spatial and temporal transferability of the method using 14 LiDAR scenes from two catchments within the JPL Airborne Snow Observatory. In each region, the optimized sensor placements are determined using the first available snow raster for the year. The accuracy in the remaining LiDAR surveys is then com- pared to 100 configurations of sensors selected at random. We find the error statistics (bias and RMSE) to be more consistent across the additional surveys than the average random configuration. 1. Introduction A major challenge of hydrologic science in montane regions relates to estimating the spatial variability of snow cover [Bales et al., 2006; Trujillo and Lehning, 2015; McCreight et al., 2014; Guan et al., 2013]. Multiple independent variables govern the distribution of snow cover, including elevation, slope, aspect, and the dis- tribution of canopy [Faria et al., 2000; Musselman et al., 2008; Lehning et al., 2011; Helfricht et al., 2014]. Non- stationary effects such as climate warming and changes in vegetation structure may significantly alter the timing and magnitude of storage and runoff in these watersheds [Goulden and Bales, 2014; Flanner et al., 2009]. Existing regression-based hydrologic models, which use statistical relations from historical hydro- graphs to predict runoff and inform allocation decisions [Perkins et al., 2009; Rosenberg et al., 2011; Rango and Martinec, 1995] will have limited accuracy as conditions deviate from historical norms and thus may prove to be inadequate for predictions in water management. C 2016. American Geophysical Union. V All Rights Reserved. OROZA ET AL. Recent research has focused on improving hydrologic models by assimilating remote-sensing and in situ measurements with distributed energy-balance models to better estimate storage and runoff [Guan et al., 2013]. These methods use well-developed remote-sensing [Painter et al., 2003; Rosenthal and Dozier, 1996; € lli et al., 2002; Egli et al., 2012] and energy-balance models [Marks et al., 1992; Link and Marks, Dozier, 1989; P a 1999; Brubaker et al., 1996] to estimate snow and snowmelt processes across basins. In situ measurements for these methods are presently limited to snow pillows and snow courses, which largely sample flat, open terrain [Molotch and Bales, 2006], yet the distribution of snow cover can vary considerably as a function of topographic features. To address this, in situ sensor measurements can be deployed to capture the mean and variance of the snow depth, which can be used to inform models that use these statistics as inputs [e.g., Essery and Pomeroy, 2004]. Alternatively, individual sensor measurements can be used together with OPTIMIZING SNOW SENSOR PLACEMENTS