Abstract. Streamflow gauging stations not only track the pulse of rivers but also act as common reference points for hydrologic and other environmental analyses. As such, streamflow data and metadata on gauging stations – Geographic Information System (GIS) data on station locations, their upstream catchment boundaries and river flow networks – are critical for analyses. However, for India's river basins, the availability of such data is limited; when available, data are not in an analysis-ready format and can have substantial errors. Studies often use available information from India's water agencies as is, without checking its validity. This study addresses the above limitations by building a new dataset using existing metadata (from the Central Water Commission, CWC, and the Water Resources Information System, WRIS) and checking it against publicly available information from global data sources (e.g., World Wildlife Fund, Multi-Error-Removed Improved-Terrain Hydro and Copernicus) and online maps (e.g., Google Maps). The quality control process categorizes existing metadata based on their consistency with these sources; also, existing metadata are supplemented with additional information where needed. The new dataset developed here is called the “Geospatial dataset for Hydrologic analyses in India” (GHI) and uses Hydrological data and maps based on Shuttle Elevation Derivatives at multiple Scales (HydroSHEDS) data as the underlying template. GHI has both geospatial and time series information. In this initial version of GHI, the spatial domain includes only the river basins of Peninsular India where daily streamflow data are publicly available. Following the quality control process, the CWC's 645 stations in Peninsular India were categorized into three groups: Group 1 (reliable metadata and adequate daily streamflow data; 213 stations), Group 2 (reliable metadata and inadequate or no daily streamflow data; 259 stations) and Group 3 (missing or unreliable metadata; 173 stations). For each of the 472 stations falling into groups 1 and 2, catchment-specific annual and monthly time series spanning 71 water years (1950–2020) of the following were compiled: observed precipitation from the Indian Meteorological Department (IMD); observed streamflow from WRIS; estimated precipitation, evapotranspiration (ET) and streamflow from ERA5-Land; and ET from the Global Land Evaporation Amsterdam Model (GLEAM). A preliminary analysis of catchment-scale time series of data indicates that, while the compiled data appear reasonable over most of the study domain, spurious runoff–precipitation ratios were observed in the hilly coastal regions of Western India. This adds to yet another data-related obstacle faced by the hydrologic community. In order to quantify historical changes and reconcile them with anticipated future changes, the community needs robust and reliable hydrographic and hydrometeorological datasets as well as unrestricted access to such datasets. The goal of this study is to highlight the limitations of existing datasets and pave the way for a community-led effort towards building the needed datasets. GHI serves as a placeholder until such datasets become available. Potential improvements to GHI are discussed. GHI is publicly available at https://doi.org/10.5281/zenodo.7563599 (Goteti, 2023).