Augmented reality (AR) has demonstrated its superior performance to benefit manual assembly tasks by delivering intuitive guidance on the workbench directly, alleviating the mental load while leading to time-saving operations. Nevertheless, current AR-assisted assembly mainly focuses on superimposing visual instructions onto real scenarios and assumes that the worker performs correctly as instructed, ignoring the confirmation of the actual assembly execution processes, and it is still difficult to avoid operating errors on the shop floor. To this end, this paper aims to propose a multi-modal context-aware on-site assembly stage recognition for human-centric AR assembly. Firstly, a sim-real point cloud-based semantic understanding method for assembly stage identification is presented, which can recognize the current sequence stage during the AR assembly process even when encountering weakly textured workpieces. In addition, a 2D image-based semantic recognition for on-site images is applied as compensation from the RGBD camera, resulting in a robust multi-modal context-aware assembly stage validation for the ongoing AR assembly tasks. Followed by a context-aware closed-loop AR assembly system that provides actual assembly result confirmation automatically, relieving the mental load for workers to activate the next assembly instruction, as well as confirm the current status during the actual operation. Finally, extensive experiments are carried out, and the results illustrate that the proposed context-aware AR assembly system can monitor the on-site sequence stage while providing a human-centric AR assembly action.