Abstract

<span>Feature engineering (FE) is one of the most important steps in data science research. FE provides useful features to be used later in the study. Due to climate change, the research focus is moving towards air quality estimation and the impacts of air pollution on health in Malaysia. Malaysia has 66 air quality monitoring (AQM) stations, and the air quality data for research is provided in an excel worksheet format by the Department of Environment, Malaysia. The data generated by the AQM stations is in a raw custom format, and it is virtually impossible to clean and engineer this data manually due to the sheer number of files. Hence, we propose a novel feature engineering algorithm to transform and combine this data into a useable format. The results show that the proposed feature engineering algorithm was able to efficiently extract and combine the hourly and daily values for pollutant and meteorological variables in useful row format. This algorithm will help all the researchers using the data from the AQM station in Malaysia as well as other countries using the same AQM station. The implementation of the feature engineering algorithm is also available to use at GitHub (https://github.com/rajasherafgun/featureengineeringaq) under AFL-3.0 license.</span>

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.