Abstract

Carbon monoxide (CO) and oxides of nitrogen (NOx) are noxious pollutants associated with combined cycle gas turbine (CCGT) power plants that require careful monitoring and control. A publicly available dataset links five years of hourly CCGT emissions to nine recorded operational/environmental variables. This study applies twelve multi-linear regression (MLR) and machine learning (ML) models to rigorously predict CO and NOx for that dataset with a novel analytical sequence. Statistical distributions of the data reveal time shifts in the CO and NOx values that cannot be explained by the recorded variables. The relative variable weights applied to the variables by the certain MLR/ML models identifies that their relative importance in influencing CO and NOx predictions varies substantially. Multi-K-fold cross validation is applied to establish the prediction accuracy of each model and indicate the most suitable data-record splits for training and validation. Trained K-nearest neighbour (KNN) and extreme gradient boosting (XGB) ML models consistently provide the most reliable CO and NOx emission predictions with least errors when applied to unseen test data. Consideration of mean errors and predicted versus measured data trends make it possible to identify and partially correct MLR/ML predictions for systematic time shifts in the CO and NOx data. The best performing KNN models generate mean average errors of 0.8458 mg/m3 for CO (∼1.9 % of the CO data range) and 5.1234 mg/m3 for NOx (∼5.5 % of the NOx data range) for a two year period separate from the model’s training/validation period.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call