Deformable registration is required to generate a time-integrated activity (TIA) map which is essential for voxel-based dosimetry. The conventional iterative registration algorithm using anatomical images (e.g., computed tomography (CT)) could result in registration errors in functional images (e.g., single photon emission computed tomography (SPECT) or positron emission tomography (PET)). Various deep learning-based registration tools have been proposed, but studies specifically focused on the registration of serial hybrid images were not found. In this study, we introduce CoRX-NET, a novel unsupervised deep learning network designed for deformable registration of hybrid medical images. The CoRX-NET structure is based on the Swin-transformer (ST), allowing for the representation of complex spatial connections in images. Its self-attention mechanism aids in the effective exchange and integration of information across diverse image regions. To augment the amalgamation of SPECT and CT features, cross-stitch layers have been integrated into the network. Two different 177 Lu DOTATATE SPECT/CT datasets were acquired at different medical centers. 22 sets from Seoul National University and 14 sets from Sunway Medical Centre are used for training/internal validation and external validation respectively. The CoRX-NET architecture builds upon the ST, enabling the modeling of intricate spatial relationships within images. To further enhance the fusion of SPECT and CT features, cross-stitch layers have been incorporated within the network. The network takes a pair of SPECT/CT images (e.g., fixed and moving images) and generates a deformed SPECT/CT image. The performance of the network was compared with Elastix and TransMorph using L1 loss and structural similarity index measure (SSIM) of CT, SSIM of normalized SPECT, and local normalized cross correlation (LNCC) of SPECT as metrics. The voxel-wise root mean square errors (RMSE) of TIA were compared among the different methods. The ablation study revealed that cross-stitch layers improved SPECT/CT registration performance. The cross-stitch layers notably enhance SSIM (internal validation: 0.9614vs. 0.9653, external validation: 0.9159vs. 0.9189) and LNCC of normalized SPECT images (internal validation: 0.7512vs. 0.7670, external validation: 0.8027vs. 0.8027). CoRX-NET with the cross-stitch layer achieved superior performance metrics compared to Elastix and TransMorph, except for CT SSIM in the external dataset. When qualitatively analyzed for both internal and external validation cases, CoRX-NET consistently demonstrated superior SPECT registration results. In addition, CoRX-NET accomplished SPECT/CT image registration in less than 6s, whereas Elastix required approximately 50s using the same PC's CPU. When employing CoRX-NET, it was observed that the voxel-wise RMSE values for TIA were approximately 27% lower for the kidney and 33% lower for the tumor, compared to when Elastix was used. This study represents a major advancement in achieving precise SPECT/CT registration using an unsupervised deep learning network. It outperforms conventional methods like Elastix and TransMorph, reducing uncertainties in TIA maps for more accurate dose assessments.