Abstract

Closed-circuit television inspections of sewer condition deterioration as required for proactive management are expensive and hence limited to portions of a sewer network. The data mining approach presented herein is shown capable of unlocking information contained within inspection records and enhances existing pipe inspection practices currently used in the wastewater industry. Predictive models developed using the random forests algorithm are found capable of predicting individual sewer pipe condition so that uninspected pipes in a sewer network with the greatest likelihood of being in a structurally defective condition state are identified for future rounds of inspection. Complications posed by imbalance between classes common within inspection datasets are overcome by first establishing the classification task in a binary format (where pipes are in either good or bad structural condition) and then using the receiver-operating characteristic (ROC) curve to establish alternative cutoffs for the predicted class probability. The random forests algorithm achieved a stratified test set false negative rate of 18%, false positive rate of 27% and an excellent area under the ROC curve of 0.81 in a case study application to the City of Guelph, Ontario, Canada. The novel inclusion of condition information of pipes attached at either the upstream or downstream manholes of an individual pipe enhances the predictive power for bad pipes representing the minority class of interest (reducing the false negative rate to 11%, reducing the false positive rate to 25% and increasing the area under the ROC curve to 0.85). An area under the ROC curve >0.80 indicates random forests are an “excellent” choice for predicting the condition of individual pipes in a sewer network.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call