Biopharmaceutical manufacturing is a rapidly growing industry with impact in virtually all branches of medicine. Biomanufacturing processes require close monitoring and control, in the presence of complex bioprocess dynamics with many interdependent factors, as well as extremely limited data due to the high cost of experiments and the novelty of personalized bio-drugs. We develop a new model-based reinforcement learning framework that can achieve human-level control in low-data environments. A dynamic Bayesian network is used to capture causal interdependencies between factors and predict how the effects of different inputs propagate through the pathways of the bioprocess mechanisms. This model is interpretable and enables the design of process control policies that are robust against model risk. We present a computationally efficient, provably convergent stochastic gradient method for optimizing such policies. Validation is conducted on a realistic application with a multidimensional, continuous state variable. History: Accepted by Bruno Tuffin, Area Editor for Simulation. Funding: This work was partially supported by National Institute of Standards and Technology [Grant 70NANB17H002]. Supplemental Material: The online appendix is available at https://doi.org/10.1287/ijoc.2022.1232 .