Deep convolutional neural networks have achieved fairly high accuracy for single online handwritten Chinese character recognition (SOLHCCR). However, in real application scenarios, users always write multiple characters to form a complete sentence, and previous contextual information holds significant potential for improving the accuracy, robustness and efficiency of recognition. In this work, we first propose a simple and straightforward model named the vanilla compositional network (VCN) by coupling convolutional neural network with a sequence modeling architecture (i.e., a recurrent neural network or Transformer), which exploits the handwritten character's previous contextual information. Although VCN performs much better than the previous state-of-the-art SOLHCCR models, it is a two-stage architecture in nature. It suffers from high fragility when confronting with poorly written characters such as sloppy writing, and missing or broken strokes, due to relying heavily on contextual information. To improve the robustness of the OLHCCR model, we further propose a novel deep spatial & contextual information fusion network (DSCIFN). It utilizes an autoregresssive framework pre-trained on a large-scale sentence corpora as the backbone component, and highly integrates the spatial features of handwritten characters and their previous contextual information in a multi-layer fusion module. To verify the effectiveness of models, we reorganize a new form of online Chinese handwritten character with its previous context dataset, named OHCCC. Extensive experimental results demonstrate that DSCIFN achieves state-of-the-art performance and has increased strong robustness compared to VCN and previous SOLHCCR models. The in-depth empirical analysis and case study indicate that DSCIFN can significantly improve the efficiency of handwriting input because it does not need complete strokes to recognize a handwritten Chinese character precisely.