Introduction Combined action observation and motor imagery training (AO+MI training), which involves motor imagery during action observation and physical training, has been attracting attention as an effective strategy for learning motor skills. However, little has been reported on the effects of AO+MI training. In the present study, we compared the effects of AO+MI training to the effects of physical training on upper-extremity performance. Materials and methods Ninety-six healthy participants were randomly assigned to either the control group or the experimental group. Sport stacking, which is often used to evaluate upper-extremity performance, was adopted for the task. The experiment was scheduled for three days. The training was 20 min per day. The control group performed only physical training, while the experimental group performed four 5-min AO+MI training sessions. Time taken to complete a sport stacking try (task completion time) was defined as the index of speed of upper-extremity performance and number of fallen cups as the index of its accuracy. The outcomes within each group and between the two groups were compared. Results Both AO+MI training and physical training showed reduced task completion time and increased number of fallen cups. There were no significant differences in the degree of changes between the groups. Conclusion Results from the present study showed that AO+MI training and physical training had almost the same influence on upper-extremity performance in the early stages of learning sport stacking. This result suggests that AO+MI training may be an effective and low-burden training method for participants.