The hybrid brain-computer interface (hBCI) combining motor imagery (MI) and steady-state visual evoked potential (SSVEP) has been proven to have better performance than a pure MI- or SSVEP-based brain-computer interface (BCI). In most studies on hBCIs, subjects have been required to focus their attention on flickering light-emitting diodes (LEDs) or blocks while imagining body movements. However, these two classical tasks performed concurrently have a poor correlation. Therefore, it is necessary to reduce the task complexity of such a system and improve its user-friendliness. Aiming to achieve this goal, this study proposes a novel hybrid BCI that combines MI and intermodulation SSVEPs. In the proposed system, images of both hands flicker at the same frequency (i.e., 30 Hz) but at different grasp frequencies (i.e., 1 Hz for the left hand, and 1.5 Hz for the right hand), resulting in different intermodulation frequencies for encoding targets. Additionally, movement observation for subjects can help to perform the MI task better. In this study, two types of brain signals are classified independently and then fused by a scoring mechanism based on the probability distribution of relevant parameters. The online verification results showed that the average accuracies of 12 healthy subjects and 11 stroke patients were 92.40 ± 7.45% and 73.07 ± 9.07%, respectively. The average accuracies of 10 healthy subjects in the MI, SSVEP, and hybrid tasks were 84.00 ± 12.81%, 80.75 ± 8.08%, and 89.00 ± 9.94%, respectively. The high recognition accuracy verifies the feasibility and robustness of the proposed system. This study provides a novel and natural paradigm for a hybrid BCI based on MI and SSVEP.