Abstract

To expand our knowledge of the climate in the Lesser Antilles, we attempted to identify the spatio-temporal configurations of daily weather. We noticed certain pitfalls that can lead to poor results when using clustering algorithms and have proposed some steps towards the solution. These advancements might prove interesting for climate informatics, as well as for many applications that cluster physical fields. We illustrated the pitfalls with a dataset of cumulative rainfall from NASA’s Tropical Rainfall Measuring Mission for the period 2000 to 2014. First, the pitfall is the lack of numerical evaluation of the clusters found by the algorithms, which prevents the comparison of algorithms. We used silhouette index for this evaluation and to demonstrate other problems. Second, algorithms like K-means cluster the points around their barycentre. For many physical fields, this barycentre is trivial, which may lead to poor performances. Third, the L2 norm used in conventional clustering methods, such as K-means and hierarchical agglomerative clustering, focus on the exact location of fields, which leads to poor evaluations of similarity between fields. We replaced it by a similarity measure called the expert distance (ED) that compares the histograms of four zones, based on the symmetrised Kullback–Leibler divergence. It integrates the properties of the observed physical parameter and climate knowledge. With these improvements, the results revealed five clusters with high indexes. The algorithms now discriminate the daily scenarios favourably, thereby providing more physical meaning to the resulting clusters. The interpretation of these clusters as weather types is discussed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call