Statistical modeling of a nonstationary spatial extremal dependence structure is challenging. Parametric max-stable processes (MSPs) are common choices for modeling spatially indexed block maxima, where an assumption of stationarity is usual to make inference feasible. However, this assumption is unrealistic for data observed over a large or complex domain. We develop a computationally efficient method for estimating extremal dependence using a globally nonstationary but locally stationary MSP construction, with the spatial domain divided into a fine grid of subregions, each with its own dependence parameters. We use LASSO ( L 1 ) or ridge ( L 2 ) penalties to obtain spatially smooth parameter estimates. We then develop a novel data-driven algorithm to merge homogeneous neighboring subregions. The algorithm facilitates model parsimony and interpretability. To make our model suitable for high-dimensional data, we exploit a pairwise likelihood to perform inference and discuss its computational and statistical efficiency. We apply our proposed method to model monthly maximum temperature data at over 1400 sites in Nepal and the surrounding Himalayan and sub-Himalayan regions; we show significant improvements in model fit compared to a stationary model. Furthermore, we demonstrate that the estimated merged partition is interpretable from a geographic perspective and leads to better model diagnostics by adequately reducing the number of parameters.