Background: The air quality of any area depends upon the various PMs (particulate matter) and hazardous gases present in the air. Low-cost PM sensors and gas sensors are present in different target places to monitor the air quality, read the environmental data, and transmit it to local servers through the IoT device. The low-cost sensor is not reliable due to its low sensing capacity; therefore, the read data is calibrated with the meteorological data presented by the nearby meteorological Centre of that particular area. The calibrated reading data sent to the server could be analyzed through some Machine Learning [ML] models. The ML models help to predict the risk of asthma in a particular area. The risk of asthma is directly related to the air quality of the surroundings. It is observed that the air quality of the industrial area is much worse than the non-industrial belt. Air quality monitoring of industrial areas is always a challenging task due to the ununiformed pollution in some segregated places around the industry, emitting pollutants mostly from chimneys. The air quality of any area depends upon the PM (PM), i.e., PM2.5 and PM10.0, as well as the gasses like NO2(Nitrogen Dioxide), NH3 (Ammonia), SO2(Sulfur dioxide), CO(Carbon monoxide), O3(Ozone) and Benzene. These are the most hazardous gases generally emitted by common heavy industries like iron and steel. In this article, the researchers considered the industrial belt of the Asansol- Durgapur region of West Bengal, India, and predicted the risk of asthma attacks for the test dataset. The experiment was carried out on 10 different supervised machine learning [SML] models as well as semi-supervised machine learning (SSML) models. The SML models have been further refined through hyper-parameter tuning, and better results have been obtained in the case of some ML models. The result has been compared with the existing literature considering the same external environment from where the meteorological data was collected, and similar ML models have been used. The research outperformed the existing literature, which is depicted in the result and analysis section of the article. Methods: The study evaluated ML models, both supervised and semi-supervised, to assess pollution levels. Relevant features were selected while less relevant ones were discarded. Accuracy levels of different ML algorithms werecompared in the results. The most effective model for an IoT system was chosen to maximize accuracy. In semi-supervised learning, feature selection followed supervised learning, but testing was akin to unsupervised learning. Results were compared with supervised learning data, enhancing reliability. Results: The result employing various classifiers werepresented across tables after the independent parameter Ozone was removed. Following the output of several classifiers, the results were verified using the k-fold validation method, with k being set to 5 or 10, accordingly. Tables display the best outcome, which is indicated in bold characters. method: In this research work the researcher considered 9 different ML models and used them as supervised as well as semi supervised model to determine the pollution level of the certain area. In this research work the researcher also selected the most relevant features and discarded the less relevant features. In case of SML algorithm, the accuracy level of the different ML algorithm has been determined and depicted in the result analysis section. The most effective ML model has been chosen for the proposed embedded system so that accuracy could be achieved at most. In case of semi supervised algorithm the feature selection is done as per the supervised algorithm. In this case the training is done same as the SML algorithm, but the testing phase is done like unsupervised machine learning algorithm where the decision parameter is predicted and ultimately matched with the previously achieved data of SML algorithm. The reliability of this approach is much more effective than simple SML algorithm. Conclusion: This study focused on predicting asthma risk in the Asansol-Durgapur industrial belt, India, using low-cost PM and gas sensors. Data calibration with meteorological inputs enhanced accuracy. ML models predicted risk and were refined through hyper-parameter tuning. Comparative analysis showed superior performance, emphasizing the importance of precise air quality monitoring. While offering a robust framework for future research, the study’s limitation lies in its area-specific dataset.
Read full abstract