Cells act as physical computational programs that utilize input signals to orchestrate molecule-level protein-protein interactions (PPIs), generating and responding to forces, ultimately shaping all of the physiological and pathophysiological behaviors. Genome editing and molecule drugs targeting PPIs hold great promise for the treatments of diseases. Linking genes and molecular drugs with protein-performed cellular behaviors is a key yet challenging issue due to the wide range of spatial and temporal scales involved. Building predictive spatiotemporal modeling systems that can describe the dynamic behaviors of cells intervened by genome editing and molecular drugs at the intersection of biology, chemistry, physics, and computer science will greatly accelerate pharmaceutical advances. Here, we review the mechanical roles of cytoskeletal proteins in orchestrating cellular behaviors alongside significant advancements in biophysical modeling while also addressing the limitations in these models. Then, by integrating generative artificial intelligence (AI) with spatiotemporal multiscale biophysical modeling, we propose a computational pipeline for developing virtual cells, which can simulate and evaluate the therapeutic effects of drugs and genome editing technologies on various cell dynamic behaviors and could have broad biomedical applications. Such virtual cell modeling systems might revolutionize modern biomedical engineering by moving most of the painstaking wet-laboratory effort to computer simulations, substantially saving time and alleviating the financial burden for pharmaceutical industries.