Abstract

There is a growing interest in mining and handling of big data, which has been rapidly accumulating in the repositories of bioprocess industries. Biopharmaceutical industries are no exception; the implementation of advanced process control strategies based on multivariate monitoring techniques in biopharmaceutical production gave rise to the generation of large amounts of data. Real-time measurements of critical quality and performance attributes collected during production can be highly useful to understand and model biopharmaceutical processes. Data mining can facilitate the extraction of meaningful relationships pertaining to these bioprocesses, and predict the performance of future cultures. This review evaluates the suitability of various metaheuristic methods available for data pre-processing, which would involve the handling of missing data, the visualisation of the data, and dimension reduction; and for data processing, which would focus on modelling of the data and the optimisation of these models in the context of biopharmaceutical process development. The advantages and the associated challenges of employing different methodologies in pre-processing and processing of the data are discussed. In light of these evaluations, a summary guideline is proposed for handling and analysis of the data generated in biopharmaceutical process development.

Highlights

  • IntroductionOne of the major challenges in biopharmaceutical production is that unlike small molecule generic products it is impossible to manufacture identical copies of biologic products despite meticulously following well-defined analytical characterization and manufacturing techniques [1]

  • One of the major challenges in biopharmaceutical production is that unlike small molecule generic products it is impossible to manufacture identical copies of biologic products despite meticulously following well-defined analytical characterization and manufacturing techniques [1]. This raises a need for continuous real-time quality control and assurance in biopharmaceutical manufacturing, which emphasises the significance of enforcing process analytical technology (PAT) measures in a manufacturing process

  • A multitude of methods is available for imputation as discussed in the last subsection, few key aspects of the biopharmaceutical process development (BPD) datasets still render the task of gap-filling challenging

Read more

Summary

Introduction

One of the major challenges in biopharmaceutical production is that unlike small molecule generic products it is impossible to manufacture identical copies of biologic products despite meticulously following well-defined analytical characterization and manufacturing techniques [1]. The patterns of missing information fall into a wide variety of structures such as that of (1) the univariate pattern, i.e. the gaps are only present in one parameter; (2) the unit non-response pattern, i.e. there is no data recorded for the duration of a single experiment out of a batch; (3) the monotone pattern, i.e. data are missing from a specific point forward due to a change in sampling regime or instrument leading to the omission of certain measurements; (4) the general pattern, i.e. missing data can be predicted from the observed values using a linear model; (5) the planned missing pattern, i.e. a measurement is intentionally omitted due to experimental design; or (6) the latent variable pattern, i.e. a missing data structure that explains the unavoidable differences in the imputed and the actual values of the missing data point. The handling of missing data becomes an essential step in data pre-processing prior to performing any further analysis Temporal patterns such as trends, seasonal, cyclic or irregular variations were reported to render handling of missing data in time series challenging [12]. These approaches can be broadly classified into four categories (Fig. 1): (1) conventional methods including complete case analysis, ignoring and deletion of data with missing components [6, 14], (2) imputation-based methods including

Summary for handling of the gaps
Findings
Compliance with ethical standards
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call