The hippocampal-entorhinal circuit is considered to play an important role in the spatial cognition of animals. However, the mechanism of the information flow within the circuit and its contribution to the function of the grid-cell module are still topics of discussion. Prevailing theories suggest that grid cells are primarily influenced by self-motion inputs from the Medial Entorhinal Cortex, with place cells serving a secondary role by contributing to the visual calibration of grid cells. However, recent evidence suggests that both self-motion inputs and visual cues may collaboratively contribute to the formation of grid-like patterns. In this paper, we introduce a novel Continuous Attractor Network model based on a spatial transformation mechanism. This mechanism enables the integration of self-motion inputs and visual cues within grid-cell modules, synergistically driving the formation of grid-like patterns. From the perspective of individual neurons within the network, our model successfully replicates grid firing patterns. From the view of neural population activity within the network, the network can form and drive the activated bump, which describes the characteristic feature of grid-cell modules, namely, path integration. Through further exploration and experimentation, our model can exhibit significant performance in path integration. This study provides a new insight into understanding the mechanism of how the self-motion and visual inputs contribute to the neural activity within grid-cell modules. Furthermore, it provides theoretical support for achieving accurate path integration, which holds substantial implications for various applications requiring spatial navigation and mapping.