Abstract

<span>Feature engineering (FE) is one of the most important steps in data science research. FE provides useful features to be used later in the study. Due to climate change, the research focus is moving towards air quality estimation and the impacts of air pollution on health in Malaysia. Malaysia has 66 air quality monitoring (AQM) stations, and the air quality data for research is provided in an excel worksheet format by the Department of Environment, Malaysia. The data generated by the AQM stations is in a raw custom format, and it is virtually impossible to clean and engineer this data manually due to the sheer number of files. Hence, we propose a novel feature engineering algorithm to transform and combine this data into a useable format. The results show that the proposed feature engineering algorithm was able to efficiently extract and combine the hourly and daily values for pollutant and meteorological variables in useful row format. This algorithm will help all the researchers using the data from the AQM station in Malaysia as well as other countries using the same AQM station. The implementation of the feature engineering algorithm is also available to use at GitHub (https://github.com/rajasherafgun/featureengineeringaq) under AFL-3.0 license.</span>

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call