In a refinery, the accurate estimation of feed properties is crucial for precise real-time optimization (RTO), and thus, developing models for real-time estimation of crude feed properties from plant measurements remains a challenge. The unbalanced dataset arising from varying variable collection frequencies hinders estimation from plant data. To address these challenges, this study initially proposes two novel algorithms, the clustering co-training semi-supervised learning (CCoT-SSL) and stacked co-training semi-supervised learning (SCoT-SSL), to solve the unbalanced dataset problem. In contrast to prior work, this study tackles the estimation of multivariate attributes when available data comprises small labeled and large unlabeled data with a “cascade shape” structure and illustrates the proposed methods by estimating Crude Distillation Units (CDU). Results indicate that CCoT-SSL excels when the base learners are judiciously selected, which is essential for the real-time optimization of CDU and refinery operations.
Read full abstract