Predicting climate types for the Continental United States using unsupervised clustering techniques

D Sathiaraj,X Huang,J Chen

doi:10.1002/env.2524

Abstract

AbstractThe problem of clustering climate data observation sites and grouping them by their climate types is considered. Machine learning–based clustering algorithms are used in analyzing climate data time series from more than 3,000 climate observation sites in the United States, with the objective of classifying climate type for regions across the United States. Understanding the climate type of a region has applications in public health, environment, actuarial science, insurance, agriculture, and engineering.In this study, daily climate data measurements for temperature and precipitation from the time period 1946–2015 have been used. The daily data observations were grouped into three derived data sets: a monthly data set (daily data aggregated by month), an annual data set (daily data aggregated by year), and a threshold exceeding frequency data set (threshold exceeding frequency provides the frequency of occurrence of certain climate extremes). Three existing clustering algorithms from the literature, namely, k‐means, density‐based spatial clustering of applications with noise, and balanced iterative reducing and clustering using hierarchies, were each applied to cluster each of the data sets, and the resulting clusters were assessed using standardized clustering indices. The results from these unsupervised learning techniques revealed the suitability and applicability of these algorithms in the climate domain. The clusters identified by these techniques were also compared with existing climate classification types such as the Köppen classification system. Additionally, the work also developed an interactive web and map‐based data visualization system that uses efficient big data management techniques to provide clustering solutions in real time and to display the results of the clustering analysis.

Full Text