Deep clustering, which can elegantly exploit data representation to seek a partition of the samples, has attracted intensive attention. Recently, combining auto-encoder (AE) with graph neural networks (GNNs) has accomplished excellent performance by introducing structural information implied among data in clustering tasks. However, we observe that there are some limitations of most existing works: 1) in practical graph datasets, there exist some noisy or inaccurate connections among nodes, which would confuse network learning and cause biased representations, thus leading to unsatisfied clustering performance; 2) lacking dynamic information fusion module to carefully combine and refine the node attributes and the graph structural information to learn more consistent representations; and 3) failing to exploit the two separated views' information for generating a more robust target distribution. To solve these problems, we propose a novel method termed deep fusion clustering network with reliable structure preservation (DFCN-RSP). Specifically, the random walk mechanism is introduced to boost the reliability of the original graph structure by measuring localized structure similarities among nodes. It can simultaneously filter out noisy connections and supplement reliable connections in the original graph. Moreover, we provide a transformer-based graph auto-encoder (TGAE) that can use a self-attention mechanism with the localized structure similarity information to fine-tune the fused topology structure among nodes layer by layer. Furthermore, we provide a dynamic cross-modality fusion strategy to combine the representations learned from both TGAE and AE. Also, we design a triplet self-supervision strategy and a target distribution generation measure to explore the cross-modality information. The experimental results on five public benchmark datasets reflect that DFCN-RSP is more competitive than the state-of-the-art deep clustering algorithms. The corresponding code is available at https://github.com/gongleii/DFCN-RSP.