Data mining algorithm for pre-processing biopharmaceutical drug product manufacturing records

Gioele Casola,Christian Siegmund,Markus Mattern,Hirokazu Sugiyama

doi:10.1016/j.compchemeng.2018.12.001

Abstract

The quality of data plays a crucial role in providing a reliable decision-making process when improving processes and operations under uncertainty. We present a data mining-based algorithm for robustly pre-processing the manufacturing records of biopharmaceutical batch processes. The algorithm can identify the time intervals in which the process is in commercial operation, and can characterize process failures automatically. An approximate string-matching algorithm, a decision tree classifier and a constrained clustering is applied to sequence the raw data, to classify the noise and identify each single batches; finally process failure are characterized. The algorithm was applied to the records of the process named as “cleaning- and sterilizing-in-place”, which is an essential process in manufacturing environment, in a case study. The algorithm was training on state of the art manual pre-processing outcome and was applied reducing the execution time of the activity down to 11.7% while maintaining high data quality and integrity.

Full Text