The delineation of primary gross tumor volume (GTV) of nasopharyngeal carcinoma (NPC) is an essential step for radiotherapy planning. In clinical practice, radiation oncologists manually delineate the GTV in planning CT with the help of diagnostic MRI. This is because NPC tumors are closely adjacent to many important anatomic structures, and CT and MRI provide complementary strength to accurately determine the tumor extension boundary. Manual delineation is time-consuming with the potential registration errors between MRI and CT decreasing the delineation accuracy. In this study, we propose a fully automated GTV segmentation method based on CT and MRI by first aligning MRI to CT, and then, segmenting the GTV using a multi-modality deep learning model. We collected 104 nasopharyngeal carcinoma patients with both planning CT and diagnostic MRI scans (T1 & T2 phases). An experienced radiation oncologists manually delineated the GTV, which was further examined by another senior radiation oncologist. Then, a coarse to fine cross-modality registration from MRI to CT was conducted as follows: (1) A rigid transformation was performed on MRI to roughly align MRI to CT with similar anatomic position. (2) Then, the region of interest (RoI) on both CT and rigid-transformed MRI were cropped. (3) A leading cross-modality deformable registration algorithm, named DEEDS, was applied on the cropped MRI and CT RoIs to find an accurate local alignment. Next, using CT and registered MRI as the combined input, a multi-modality deep segmentation network based on nnUNet was trained to generate the GTV prediction. 20% patients were randomly selected as the unseen testing set to quantitatively evaluate the performance. The quantitative NPC GTV segmentation performance is summarized in Table 1. The deep segmentation model using CT alone achieved reasonable high performance with 76.6% Dice score and 1.34mm average surface distance (ASD). When both CT and registered MRI were used, the segmentation model further improved the performance by 0.9% Dice score increase and 11% relative ASD error reduction, demonstrating the complementary strength of CT and MRI in determining NPC GTV. Notably, the achieved 77.5% Dice score and 1.19mm ASD by the multimodality model is among the top performing results reported in recent automatic NPC GTV segmentation using either CT or MRI modality. We developed a fully automated multi-modal deep-learning model for NPC GTV segmentation. The developed model can segment the NPC GTV in high accuracy. With further optimization and validation, this automated model has potential to standardize the NPC GTV segmentation and significantly decrease the workload of radiation oncologists in clinical practice.