AbstractSatellite-based precipitation products are commonly evaluated using gauge measurement, yet their regional evaluation and hydrological applicability have not been sufficiently studied, especially for dry basins. In this study, we evaluated the performance of four state-of-the-art remotely sensed precipitation products (CMORPH, GSMaP, IMERG, and PERSIANN-CDR) and their ensemble products (the reliability ensemble averaging and three-cornered hat methods) over the Heihe River basin, northwest China. Both direct evaluation using gauge measurement during 2001–19 and indirect evaluation using the Soil and Water Assessment Tool (SWAT) model during 2001–10 were conducted. Our results showed that 1) for point-to-pixel evaluation, GSMaP and IMERG products with high spatial resolution effectively captured the quantile distribution of gauge data; 2) compared to the spatially interpolated gauge data, all products underestimated the precipitation, among which GSMaP provided the closest interannual variability to the observations; 3) these products had better detection abilities upstream and during the rainy season, indicating that their performance was affected by the rain intensity—in particular, GSMaP exhibited the best ability; 4) the spatial patterns of individual products were inconsistent, while the ensemble products could reduce the bias with the gauge data; and 5) for hydrological modeling, streamflow simulation driven by GSMaP had the best performance, and the ensemble precipitation using the three-cornered hat method was better than that using the reliability ensemble averaging method. Collectively, these findings illustrated the reliability of GSMaP in representing the precipitation characteristics in similar arid areas and elucidated the advantages of using the three-cornered hat method.