This paper aims to create a new probability distribution and conducts statistical analysis on air quality dataset from Kathmandu. Using this innovative distribution, we have studied the ground reality of air quality conditions of Kathmandu, Nepal. In our research, we have developed a new probability distribution known as the New Extended Kumaraswamy Exponential Distribution by introducing an additional shape parameter to the Extended Kumaraswamy Exponential (EKwE) Distribution. Statistical characteristics such as cumulative distribution function, probability density function, hazard function, reversed hazard function, skewness, kurtosis, survival function, and hazard rate function are studied. The suggested model is non-normal and positively skewed with increasing and inverted bathtub-shaped hazard rate curves. To assess the model's suitability, we utilized a real dataset comprising air quality data from Kathmandu, Nepal, during the year 2021. Study shows that the air quality data exhibit an increasing failure rate, but the P2.5, P10, and total suspended particle concentrations exhibited its lowest levels during the monsoon season and its highest levels during the winter season. Parameters of the model are estimated by using the least square estimation (LSE), maximum likelihood estimation (MLE), and Cramér-von Mises (CVM) approach for P10 at Ratnapark Station, Kathmandu. To assess the model's validity, P-P plots and Q-Q plots are employed. Model comparisons are carried out using Akaike Information Criterion (AIC), Corrected Akaike Information Criterion (CAIC), Bayesian Information Criterion (BIC), and Hannan-Quinn Information Criterion (HQIC). Furthermore, the goodness of fit of the proposed model is evaluated using test statistics such as Anderson-Darling (A2) test, Cramér-von Mises (CVM) test, and the Kolmogorov-Smirnov (KS) test along with their respective p-values. From the findings, we have found that the air quality status of Kathmandu, Nepal, was found to be poor. Proposed distribution provides a better fit with greater flexibility for forecasting air quality data and conducting reliability data analyses. Dataset is analyzed and visualized using R programming.
Read full abstract