Prediction of ‘Weather or atmospheric condition’ by AI, machine learning techniques is a process of great challenge. Attempts had been made by Computer, Data-Scientists since long, how this condition can be performed successfully .The objective is to predict weather for a place for certain days, here ‘ALIPORE (42807)’.We collected ‘ALIPORE surface data’ (CSV file) for the period, 1969-2023. After collecting this big data, completed process of ‘data mining’ and necessary ‘feature engineering’ steps along with choosing responsible dependent or independent parameters called as predictors to find results or outputs by various machine learning packages of Python like ‘Pandas’, ‘SEABORN’, ‘STATS MODEL’ etc. ,under ‘SCIKIT LEARN’ as well as various ML code and techniques like 'Shape’, ‘drop null values’, ‘Describe’, ‘Label-encoding’, ‘IV- method’, ‘VIF method’ etc. ,some based on statistical theories . Ultimately equation of ‘Logistic Regression’ had been built with test-train split formula to predict future weather as ‘SIGNIFICANT’ or ‘CLEAR’ for certain test array. During analysis, all the weather phenomena as obtained from this big data set, were classified into two categories. No(1)--- ‘Lightning (code 0)’, ‘Drizzle (Code 5)’, ‘Rain (Code 6)’ and ‘Thunderstorm with rain (Code 9)’---for occurrence of any of these weather phenomena ,data were considered as ‘1’ or ‘SIGNIFICANT’ weather and No (2)---On the other hand , all weather except weather as mentioned above ,No (1),were considered as ‘0’ or ‘CLEAR’ weather. Keywords: Confusion-matrix, Heat-map, True-positive, True-negative, False-positive, False-negative, Accuracy- score, Classification-report, Precision, Recall, F1 score.
Read full abstract