Using Xgboost models dor daily rainfall prediction

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Machine learning models for predicting daily precipitation have gained traction in recent years. Understanding the benefits of using this technology in different regions is a relevant research topic. For this reason, this study aims to evaluate daily precipitation estimated forecasts from climate data between 1983 and 2019 in Itirapina, São Paulo, Brazil. We used a novel machine learning algorithm, XGBoost (eXtreme Gradient Boosting), to create several daily precipitation prediction models. Two tasks were modeled: the occurrence of daily precipitation (classification) and the amount of daily precipitation (regression). The results revealed that the occurrence of daily precipitation could be predicted with an accuracy of around 90%. Additionally, models were developed to predict the amount of daily precipitation with error rates of around 3mm. We observed that precipitation in the study area is directly associated with solar radiation, and estimated forecasts of precipitation and the corresponding months are characteristic of the tropical climate.

Similar Papers
  • Research Article
  • Cite Count Icon 67
  • 10.1007/s10584-012-0451-3
Multisite statistical downscaling model for daily precipitation combined by multivariate multiple linear regression and stochastic weather generator
  • Mar 24, 2012
  • Climatic Change
  • D I Jeong + 3 more

This study provides a multi-site hybrid statistical downscaling procedure combining regression-based and stochastic weather generation approaches for multisite simulation of daily precipitation. In the hybrid model, the multivariate multiple linear regression (MMLR) is employed for simultaneous downscaling of deterministic series of daily precipitation occurrence and amount using large-scale reanalysis predictors over nine different observed stations in southern Quebec (Canada). The multivariate normal distribution, the first-order Markov chain model, and the probability distribution mapping technique are employed for reproducing temporal variability and spatial dependency on the multisite observations of precipitation series. The regression-based MMLR model explained 16 % ~ 22 % of total variance in daily precipitation occurrence series and 13 % ~ 25 % of total variance in daily precipitation amount series of the nine observation sites. Moreover, it constantly over-represented the spatial dependency of daily precipitation occurrence and amount. In generating daily precipitation, the hybrid model showed good temporal reproduction ability for number of wet days, cross-site correlation, and probabilities of consecutive wet days, and maximum 3-days precipitation total amount for all observation sites. However, the reproducing ability of the hybrid model for spatio-temporal variations can be improved, i.e. to further increase the explained variance of the observed precipitation series, as for example by using regional-scale predictors in the MMLR model. However, in all downscaling precipitation results, the hybrid model benefits from the stochastic weather generator procedure with respect to the single use of deterministic component in the MMLR model.

  • Research Article
  • Cite Count Icon 158
  • 10.1029/1999jd900119
A spatiotemporal model for downscaling precipitation occurrence and amounts
  • Dec 1, 1999
  • Journal of Geophysical Research: Atmospheres
  • Stephen P Charles + 2 more

A stochastic model that relates synoptic atmospheric data to daily precipitation at a network of gages is presented. The model extends the nonhomogeneous hidden Markov model (NHMM) of Hughes et al. by incorporating precipitation amounts. The NHMM assumes that multisite, daily precipitation occurrence patterns are driven by a finite number of unobserved weather states that evolve temporally according to a first‐order Markov chain. The state transition probabilities are a function of observed or modeled synoptic scale atmospheric variables such as mean sea level pressure. For each weather state we evaluate the joint distribution of daily precipitation amounts atnsites through the specification ofnconditional distributions. The conditional distributions consist of regressions of transformed amounts at a given site on precipitation occurrence at neighboring sites within a set radius. Results for a network of 30 daily precipitation gages and historical atmospheric circulation data in southwestern Australia indicate that the extended NHMM accurately simulates the wet‐day probabilities, survival curves for dry‐ and wet‐spell lengths, daily precipitation amount distributions at each site, and intersite correlations for daily precipitation amounts over the 15 year period from 1978 to 1992.

  • Research Article
  • Cite Count Icon 173
  • 10.1016/j.jhydrol.2005.02.020
Multi-site downscaling of heavy daily precipitation occurrence and amounts
  • Apr 14, 2005
  • Journal of Hydrology
  • Colin Harpham + 1 more

Multi-site downscaling of heavy daily precipitation occurrence and amounts

  • Research Article
  • Cite Count Icon 6
  • 10.1016/j.heliyon.2024.e35933
Advancing sub-seasonal to seasonal multi-model ensemble precipitation prediction in east asia: Deep learning-based post-processing for improved accuracy
  • Aug 1, 2024
  • Heliyon
  • Uran Chung + 3 more

Advancing sub-seasonal to seasonal multi-model ensemble precipitation prediction in east asia: Deep learning-based post-processing for improved accuracy

  • Research Article
  • Cite Count Icon 3
  • 10.1016/j.jhydrol.2022.128065
Estimating multisite precipitation by a stepwise NHMM-VAR model considering the spatiotemporal correlations of precipitation amounts
  • Sep 1, 2022
  • Journal of Hydrology
  • Xini Zha + 5 more

Estimating multisite precipitation by a stepwise NHMM-VAR model considering the spatiotemporal correlations of precipitation amounts

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 2
  • 10.2166/wcc.2019.403
Statistical tool for modeling of a daily precipitation process in the context of climate change
  • Nov 29, 2019
  • Journal of Water and Climate Change
  • Myeong-Ho Yeo + 2 more

The present study proposes a climate change assessment tool based on a statistical downscaling (SD) approach for describing the linkage between large-scale climate predictors and observed daily rainfall characteristics at a local site. The proposed SD of the daily rainfall process (SDRain) model is based on a combination of a logistic regression model for representing the daily rainfall occurrences and a nonlinear regression model for describing the daily precipitation amounts. A scaling factor (SR) and correction coefficient (CR) are suggested to improve the accuracy of the SDRain model in representing the variance of the observed daily precipitation amounts in each month without affecting the monthly mean precipitation. SDRain facilitates the construction of daily precipitation models for the current and future climate conditions. The tool is tested using the National Center for Environmental Prediction re-analysis data and the observed daily precipitation data available for the 1961–2001 period at two study sites located in two completely different climatic regions: the Seoul station in subtropical-climate Korea and the Dorval Airport station in cold-climate Canada. Results of this illustrative application have indicated that the proposed functions (e.g. logistic regression, SR, and CR) contribute marked improvement in describing daily precipitation amounts and occurrences. Furthermore, the comparison analyses show that the proposed SD method could provide more accurate results than those given by the currently popular SDSM method.

  • Research Article
  • Cite Count Icon 23
  • 10.13031/trans.57.10685
Multi-Site Stochastic Weather Generator for Daily Precipitation and Temperature
  • Oct 11, 2014
  • Transactions of the ASABE
  • Jie Chen + 2 more

Abstract. Stochastic weather generators are used to generate time series of climate variables that have statistical properties similar to those of observed data. Most stochastic weather generators work for a single site and can only generate climate data at a single point or independent time series at several points. However, for hydrological impact studies, spatially coherent climate information is usually required at several locations over a watershed. This climate information can be generated using a multi-site weather generator. This article presents a new Matlab-based stochastic weather generator (MulGETS) for generating multi-site precipitation and temperature. MulGETS is an extension of a single-site weather generator that makes it possible to drive individual single-site models with temporally independent but spatially correlated random numbers. Similar to an unmodified single-site weather generator, precipitation occurrence is generated using a first-order two-state Markov chain, and temperature is generated using a first-order linear autoregressive model. However, instead of generating daily precipitation amounts based on a single gamma distribution, MulGETS uses a multi-gamma distribution to address the spatial correlation of precipitation amounts. The performance of MulGETS is evaluated with respect to its ability to produce the spatial correlation and statistical characteristics of daily precipitation and temperature for five watersheds selected from different climate conditions. The five watersheds also differ in watershed size and number of stations. The results show that MulGETS accurately preserves the spatial correlation of precipitation occurrence and amounts as well as the maximum and minimum temperatures for all watersheds. The joint probabilities of precipitation occurrence are also reasonably well reproduced. Additionally, MulGETS is capable of reproducing the mean and standard deviation of daily precipitation amounts for individual sites, as well as the watershed-averaged precipitation. Overall, MulGETS is an effective model for generating multi-site precipitation and temperature. It can easily be used as a downscaling tool for climate change impact studies by modifying its parameters based on climate model outputs. The entire set of Matlab routines utilized is available on the Mathworks file exchange site.

  • Research Article
  • Cite Count Icon 6
  • 10.1029/2008wr007526
Simulation of multisite precipitation using an extended chain‐dependent process
  • Jan 1, 2010
  • Water Resources Research
  • Xiaogu Zheng + 2 more

The chain‐dependent process is a popular stochastic model for precipitation sequence data. In this paper, the effect of daily regional precipitation occurrence is incorporated into the stochastic model. This model is applied to analyze the daily precipitation at a small number of sites in the upper Waitaki catchment, New Zealand. In this case study, the probability distributions of daily precipitation occurrence and intensity, spatial dependences, and the relation between precipitation and atmospheric forcings are simulated quite well. Specifically, some behaviors which are not well modeled by existing models, such as the extremal behavior of daily precipitation intensity, the lag 1 cross correlation of daily precipitation occurrence, spatial intermittency, and spatial correlation of seasonal precipitation totals, are significantly improved. Moreover, a new and simpler approach is proposed which successfully eliminates overdispersion, i.e., underestimation of the variance of seasonal precipitation totals.

  • Research Article
  • Cite Count Icon 127
  • 10.1175/bams-84-4-481
The WGNE Assessment of Short-term Quantitative Precipitation Forecasts
  • Apr 1, 2003
  • Bulletin of the American Meteorological Society
  • Elizabeth E Ebert + 3 more

Twenty-four-hour and 48-h quantitative precipitation forecasts (QPFs) from 11 operational numerical weather prediction models have been verified for a 4-yr period against rain gauge observations over the United States, Germany, and Australia to assess their skill in predicting the occurrence and amount of daily precipitation. Model QPFs had greater skill in winter than in summer, and greater skill in midlatitudes than in Tropics, where they performed only marginally better than “ persistence.” The best agreement among models, as well as the best ability to discriminate raining areas, occurred for a low rain threshold of 1–2 mm d−1. In contrast, the skill for forecasts of rain greater than 20 mm d−1 was generally quite low, reflecting the difficulty in predicting precisely when and where heavy rain will fall. The location errors for rain systems, determined using pattern matching with the observations, were typically about 100 km for 24-h forecasts, with smaller errors occurring for the heaviest r...

  • Research Article
  • Cite Count Icon 384
  • 10.1175/2008jamc1979.1
Development and Testing of Canada-Wide Interpolated Spatial Models of Daily Minimum–Maximum Temperature and Precipitation for 1961–2003
  • Apr 1, 2009
  • Journal of Applied Meteorology and Climatology
  • Michael F Hutchinson + 6 more

The application of trivariate thin-plate smoothing splines to the interpolation of daily weather data is investigated. The method was used to develop spatial models of daily minimum and maximum temperature and daily precipitation for all of Canada, at a spatial resolution of 300 arc s of latitude and longitude, for the period 1961–2003. Each daily model was optimized automatically by minimizing the generalized cross validation. The fitted trivariate splines incorporated a spatially varying dependence on ground elevation and were able to adapt automatically to the large variation in station density over Canada. Extensive quality control measures were performed on the source data. Error estimates for the fitted surfaces based on withheld data across southern Canada were comparable to, or smaller than, errors obtained by daily interpolation studies elsewhere with denser data networks. Mean absolute errors in daily maximum and minimum temperature averaged over all years were 1.1° and 1.6°C, respectively. Daily temperature extremes were also well matched. Daily precipitation is challenging because of short correlation length scales, the preponderance of zeros, and significant error associated with measurement of snow. A two-stage approach was adopted in which precipitation occurrence was estimated and then used in conjunction with a surface of positive precipitation values. Daily precipitation occurrence was correctly predicted 83% of the time. Withheld errors in daily precipitation were small, with mean absolute errors of 2.9 mm, although these were relatively large in percentage terms. However, mean percent absolute errors in seasonal and annual precipitation totals were 14% and 9%, respectively, and seasonal precipitation upper 95th percentiles were attenuated on average by 8%. Precipitation and daily maximum temperatures were most accurately interpolated in the autumn, consistent with the large well-organized synoptic systems that prevail in this season. Daily minimum temperatures were most accurately interpolated in summer. The withheld data tests indicate that the models can be used with confidence across southern Canada in applications that depend on daily temperature and accumulated seasonal and annual precipitation. They should be used with care in applications that depend critically on daily precipitation extremes.

  • Research Article
  • Cite Count Icon 8
  • 10.1002/met.9
Prediction of occurrence and quantity of daily summer monsoon precipitation over Orissa (India)
  • Mar 1, 2007
  • Meteorological Applications
  • U C Mohanty + 1 more

The precipitation over Orissa State, a meteorological subdivision on the east coast of India, shows large‐scale spatio‐temporal variation caused by the interaction of the basic monsoon flow with the monsoon disturbances over the Bay of Bengal and the orography owing to the Eastern Ghats and other hill peaks in Orissa and its neighbourhood. Hence, it is difficult to predict daily precipitation over Orissa. The objective of this study is to predict the occurrence and quantity of precipitation 24 h ahead, over specific locations of Orissa during the summer monsoon season (June‐September). For this purpose, a probability of precipitation (PoP) model has been developed by applying stepwise regression with the available surface and upper air parameters from synoptic and radiosonde and radio wind stations in and around Orissa as potential predictors. The parameters selected through stepwise regression for the PoP model have been used to develop a probabilistic model for a Quantitative Precipitation Forecast (QPF) in different ranges, such as 0.1–10, 11–25, 26–50, 51–100 and > 100 mm, using Multiple Discriminant Analysis (MDA). Both the PoP and QPF models have been developed based on data from 1980 to 1994 and verified with the data from 1995 to 1998.Considering six representative stations for six homogeneous regions in Orissa, the PoP model performs very well with percentages of correct forecast for occurrence/non‐occurrence of precipitation being about 73 and 65% respectively for developmental and independent data. However, the skill score of the MDA model for categorical forecast is poor, especially for higher values of precipitation. Copyright © 2007 Royal Meteorological Society

  • Research Article
  • Cite Count Icon 22
  • 10.1016/j.jhydrol.2003.11.027
Generation of daily amounts of precipitation from standard climatic data: a case study for Argentina
  • Feb 20, 2004
  • Journal of Hydrology
  • F Castellvı́ + 2 more

Generation of daily amounts of precipitation from standard climatic data: a case study for Argentina

  • Research Article
  • Cite Count Icon 25
  • 10.1155/2022/2220527
An External-Validated Prediction Model to Predict Lung Metastasis among Osteosarcoma: A Multicenter Analysis Based on Machine Learning.
  • May 6, 2022
  • Computational Intelligence and Neuroscience
  • Wenle Li + 12 more

Background Lung metastasis greatly affects medical therapeutic strategies in osteosarcoma. This study aimed to develop and validate a clinical prediction model to predict the risk of lung metastasis among osteosarcoma patients based on machine learning (ML) algorithms. Methods We retrospectively collected osteosarcoma patients from the Surveillance Epidemiology and End Results (SEER) database and from four hospitals in China. Six ML algorithms, including logistic regression (LR), gradient boosting machine (GBM), extreme gradient boosting (XGBoost), random forest (RF), decision tree (DT), and multilayer perceptron (MLP), were applied to build predictive models for predicting lung metastasis using patient's demographics, clinical characteristics, and therapeutic variables from the SEER database. The model was internally validated using 10-fold cross-validation to calculate the mean area under the curve (AUC) and the model was externally validated using the Chinese multicenter osteosarcoma data. Relative importance ranking of predictors was plotted to understand the importance of each predictor in different ML algorithms. The correlation heat map of predictors was plotted to understand the correlation of each predictor, selecting the 10-fold cross-validation with the highest AUC value in the external validation ROC curve to build a web calculator. Results Of all enrolled patients from the SEER database, 17.73% (194/1094) developed lung metastasis. The multiple logistic regression analysis showed that sex, N stage, T stage, surgery, and bone metastasis were all independent risk factors for lung metastasis. In predicting lung metastasis, the mean AUCs of the six ML algorithms ranged from 0.711 to 0.738 in internal validation and 0.697 to 0.729 in external validation. Among the six ML algorithms, the extreme gradient boosting (XGBoost) model had the highest AUC value with an average internal AUC of 0.738 and an external AUC of 0.729. The best performing ML algorithm model was used to build a web calculator to facilitate clinicians to calculate the risk of lung metastasis for each patient. Conclusions The XGBoost model may have the best prediction effect and the online calculator based on this model can help doctors to determine the lung metastasis risk of osteosarcoma patients and help to make individualized medical strategies.

  • Research Article
  • Cite Count Icon 17
  • 10.1175/jcli-d-12-00302.1
Mapping Weather-Type Influence on Senegal Precipitation Based on a Spatial–Temporal Statistical Model*
  • Oct 4, 2013
  • Journal of Climate
  • Henning W Rust + 3 more

Senegal is particularly vulnerable to precipitation variability. To investigate the influence of large-scale circulation on local-scale precipitation, a full spatial–statistical description of precipitation occurrence and amount for Senegal is developed. These regression-type models have been built on the basis of daily records at 137 locations and were developed in two stages: (i) a baseline model describing the expected daily occurrence probability and precipitation amount as spatial fields from monsoon onset to offset, and (ii) the inclusion of weather types defined from the NCEP–NCAR reanalysis 850-hPa winds and 925-hPa relative humidity establishing the link to the synoptic-scale atmospheric circulation. During peak phase, the resulting types appear in two main cycles that can be linked to passing African easterly waves. The models allow the investigation of the spatial response of precipitation occurrence and amount to a discrete set of preferred states of the atmospheric circulation. As such, they can be used for drought risk mapping and the downscaling of climate change projections. Necessary choices, such as filtering and scaling of the atmospheric data (as well as the number of weather types to be used), have been made on the basis of the precipitation models' performance instead of relying on external criteria. It could be demonstrated that the inclusion of the synoptic-scale weather types lead to skill on the local and daily scale. On the interannual scale, the models for precipitation occurrence and amount capture 26% and 38% of the interannual spatially averaged variability, corresponding to Pearson correlation coefficients of rO = 0.52 and ri = 0.65, respectively.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 5
  • 10.31481/uhmj.22.2018.04
Nature of extreme precipitation over Ukraine in the 21st century
  • Dec 3, 2018
  • Ukrainian hydrometeorological journal
  • V F Martazinova + 1 more

The article examines the state of precipitation over the territory of Ukraine over recent decades. Through the example of central months of the seasons differences in monthly and average daily precipitation amounts for the period of 2000-2014 are shown. Within the most territory of Ukraine summer precipitation is almost twice as high as spring and autumn one. During all seasons the greatest amount of precipitation is observed in the Carpathian region. Distribution of average long-term precipitation values over the rest of the territory coincides in spring, summer and autumn: the highest precipitation values are observed in the western and north-western parts and decrease to the south-east.
 The article studies a yearly precipitation rate at low-land and mountain meteorological stations. It proposes to separate criteria of precipitation extremality depending on the regions. All extreme daily precipitation can be divided into the following categories: > 20-30 mm / day, > 30-50 mm / day, > 50 mm / day. Each category of extreme precipitation has its a certain economic risk, but the third class can cause not only economic risks, but also risks associated with human life and activities. The distinct feature of the present-day precipitation consists in redistribution of precipitation in the middle of the months, when a daily precipitation rate increases together with intervals between heavy rains.
 In order to analyze the changes of precipitation regime, the approach of dividing the rates of monthly precipitation amount by the rates of extreme and non-extreme precipitation is proposed. A comparative analysis of daily precipitation in different seasons and over different climatic periods was also carried out. The article studies the proportion of daily precipitation of up to 15 mm and the one exceeding 15 mm forming a part of monthly rates of precipitation over the territory of Ukraine. In January, rainfalls exceeding 15 mm make up from 5-10 % of the total amount of monthly precipitation, except the Carpathian region and the southwestern regions of Ukraine where those exceed 20-25 %. In spring, the amount of rainfalls increases and its percentage of the monthly precipitation amount is around 20 % over most of the regions. Until summer, the amount of rainfalls increases and in July its percentage is 50-70 %. Until autumn, the amount of those starts decreasing, however, the percentage of rainfalls is almost twice as high as in spring, and for most of the regions it is about 30-40 %. Such breakdown of the monthly precipitation rates into two components allows determination during a period in question of precipitation amounts we have each month.
 The maximum daily precipitation amounts serve as an important indicator of the precipitation regime which shows the potential danger from extreme precipitation. For different regions the threshold values of the upper limit of rainfalls taken as a maximum daily value for the period of 2000-2014 differ. In winter and spring time, the limit of rainfalls amount per day usually hits 20-30 mm for the most territory of the country. At the same time there are certain areas where the limit values of the daily rainfalls rate reach 40-50 mm. The most significant rainfalls are observed in summer. Despite the fact that the territory of such rainfalls is quite patchy, nevertheless, those areas where precipitation rate over one day may reach 70 mm are the most vulnerable and have high risks for human life and activities. In autumn, the threshold values are 30-40 mm.
 The breakdown of the rates of monthly precipitation amount into extreme and non-extreme ones allows determination in future of whether the precipitation regime changes because of extreme or non-extreme values. Also, in the long run, a comparative analysis of the rates of showers and weak rainfalls in the late 20th and early 21st centuries can be carried out and a tendency of precipitation regime seasonal change over the next decade can be obtained which will help us to identify vulnerable regions suffering from extreme precipitation rates.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.