Language conveys both semantic and emotional information. Emotional processing in language could not be easily explained by the current word recognition model, sentence processing model and discourse comprehension model. In order to uncover the characteristics of emotional processing in written language, the present review attempted to integrate findings from behavioral, event-related potential (ERP), and functional magnetic resonance imagining (fMRI) studies on emotional processing at multiple scales, such as words, sentences and discourses levels. First, previous studies have shown both automatic and controlled processes at different stages of emotional words processing. Some early ERP effects (before 300 ms: such as N1/P1, P2, EPN) as well as the activation of subcortical region (such as amygdala) have been taken as evidences for the rapid and automatic emotional processing. Some late ERP effects (such as late positive component after 500 ms) and the recruitment of higher-order brain areas (such as medial prefrontal cortex and cingulate cortex) have been suggested to reflect the controlled processing of the emotion words. In addition, emotional connotation could enhance cortical responses at all stages of visual word processing such as the assembly of visual word form (up to 200 ms), semantic access (around 200 ms), allocation of attentional resources (around 300 ms), contextual analysis (around 400 ms), and sustained processing (around 500 ms). Moreover, the network of word reading is influenced by the emotional network as manifested by the enhanced engagement of lexical-semantic network when processing emotional words compared to neutral words. However, it remains unclear regarding the time locus of emotional effect with respect to the lexical access. Another important finding is that emotional words obtained prioritized processing no matter when they were presented in isolation or shown in contexts. The salience of emotion information could override detailed semantic analysis, as indicated by the lack of semantic violation effect when the violations occurred to emotion words. Also, the emotional dimension of sentences was prioritized when the emotional words were embedded in sentences, as indicated by the N400 effect to emotion words following neutral congruent contexts. Moreover, the influence of emotional word on sentence processing was sustained, as suggested by its influence on the processing of its following words in sentences. Finally, we showed that emotion can be induced directly by emotional words or implied by a series of neutral words (for example, ″The boy fell asleep and never woke up again″). Both types of emotional sentences activated emotional brain network (such as amygdala, insula, medial prefrontal cortex), which in turn enhanced the involvement of language network (such as inferior frontal gyrus, middle temporal gyrus) compared to neutral sentences. Therefore, the language and emotional networks are highly interactive. Overall, the current review summarized main findings regarding emotional processing in written language. We have shown that the study of emotional processing in language has significant importance for psycholinguistics and affective neuroscience. We proposed that it might be a useful approach to investigate emotional language processing from an embodiment perspective. Future studies could further investigate the functional connectivity of the emotional and language networks.