Abstract

The database obtained from the rp5 data archive is provided by the LLC "Weather Schedule" and represents the collected information about weather conditions in all parts of the world, describing temperature, clouds, and precipitation, and is available for analysis to everyone. Data is collected every 3 hours, which provides valuable data from day to day. Many of these features are optional and dependent on time, like maximum temperature during night time, also vision range during night always equals zero, etc. The purpose of this study is to create a solution, that can deal with missing data, and to find an important machine-learning combination for weather prediction. It needs to be mentioned that a study should be done on open data since it doesn’t have any presumptions inside. With work on raw and open data, we can try to establish new rules in weather prediction modeling and find meaningful solutions for Kazakhstani society. For this research, algorithms for implementing prediction models were used from the scikitlearn Python library. It contains Gradient Boosting Regressor, XGBoost, CatBoost, Linear Regression, Bayesian Ridge, etc. Applied machine learning algorithms were evaluated based on different approaches: from various data preprocessing ideas to selecting best best-performing model with better results and optimizing it to achieve the possible maximum in predictions. The key course of this study is to help find a way to the optimal approach to weather prediction problems and analysis by the way, which current tendencies can look at it.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call