With computer vision technology and prediction of ground reaction forces (GRF), a previous study performed markerless motion capture and musculoskeletal simulation with two smartphones (OpenCap). A recent approach can reconstruct 3D human motion from a single video without calibration and it may further simplify the motion capture process. However it has not been combined with musculoskeletal simulation and the validity is unclear. Therefore, the purpose of this study was to determine the validity of the musculoskeletal simulation using a monocular vision approach. An open-source dataset that contains motion capture and video data during gait from 10 healthy participants was used. Human motion reconstruction with the skinned human (SMPL) model was performed on each video. Virtual marker data was generated by extracting the position data from the SMPL skin vertices. Inverse kinematics, GRF prediction (only for monocular vision approach), inverse dynamics and static optimization were performed using a musculoskeletal model for experimental motion capture data and the generated virtual markers from videos. Mean absolute errors (MAE) between motion capture based and monocular vision based simulation outcomes were calculated. The MAE were 8.4° for joint angles, 5.0 % bodyweight for GRF, 1.1 % bodyweight*height for joint moments and 0.11 for estimated muscle activations from 16 muscles. The entire MAE was larger but some were comparable to OpenCap. Using the monocular vision approach, motion capture and musculoskeletal simulation can be done with no preparations and is beneficial for clinicians to quantify the daily gait assessment.