Total sialic acid content (TSA) in biotherapeutic proteins is often a critical quality attribute as it impacts the drug efficacy. Traditional wet chemical assays to quantify TSA in biotherapeutic proteins during cell culture typically takes several hours or longer due to the complexity of the assay which involves isolation of sialic acid from the protein of interest, followed by sample preparation and chromatographic based separation for analysis. Here, we developed a machine learning model-based technology to rapidly predict TSA during cell culture by using typically measured process parameters. The technology features a user interface, where the users only have to upload cell culture process parameters as input variables and TSA values are instantly displayed on a dashboard platform based on the model predictions. In this study, multiple machine learning algorithms were assessed on our dataset, with the Random Forest model being identified as the most promising model. Feature importance analysis from the Random Forest model revealed that attributes like viable cell density (VCD), glutamate, ammonium, phosphate, and basal medium type are critical for predictions. Notably, while the model demonstrated strong predictability by Day 14 of observation, challenges remain in forecasting TSA values at the edges of the calibration range. This research not only emphasizes the transformative power of machine learning and soft sensors in bioprocessing but also introduces a rapid and efficient tool for sialic acid prediction, signaling significant advancements in bioprocessing. Future endeavors may focus on data augmentation to further enhance model precision and exploration of process control capabilities.
Read full abstract