Abstract
Using empirical models to predict whether sections within pipes have defects can save inspection costs and, potentially, avoid oil spills. Optimal Classification Tree (OCT) formulations offer potentially desirable combinations of interpretability and prediction accuracy on unseen pipes. Approaches based on powerful state-of-the-art OCT formulations have enabled researchers to solve decision tree problems optimally instead of using traditional sub-optimal greedy approaches. Yet, the recently proposed formulations also have limitations. Some of the most recent formulations require a large number of decision variables and constraints leading to computational inefficiencies. Previous formulations have optimal solutions with undesirable or invalid tree structures which may depend on the particular software implementation. Additionally, some formulations always grow a full tree even when desirable parsimonious tree options are available. This article proposes the Modified Optimal Classification Tree (M-OCT) formulation with novel leaf-branch-interaction constraints, which could stabilize the previous formulation and reduce the chance of invalid tree structures when generating optimal trees. By incorporating the idea of binary encoding of thresholds from a previous article, we reduce the total number of binary variables. We then extend M-OCT to construct a novel formulation called Binary Node Penalty Optimal Classification Tree (BNP-OCT) with binary splits and node complexity constraints, which support efficiency in standard branch-and-cut solvers and prevents the overfitting issue when learning the optimal tree models. We compare the proposed methods with alternatives including standard formulations using 15 standard data sets. In addition, we use 750 test cases to compare the computational stability of pre-existing formulations to those involving the proposed leaf-branch constraints. We demonstrate that the proposed formulation offers advantages in accuracy, computational efficiency, and structural stability. We also describe how the proposed methods are able to achieve 94% classification accuracy on balanced test sets for unseen pipes.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.