Graphs are used in various disciplines such as telecommunication, biological networks, as well as social networks. In large-scale networks, it is challenging to detect the communities by learning the distinct properties of the graph. As deep learning has made contributions in a variety of domains, we try to use deep learning techniques to mine the knowledge from large-scale graph networks. In this paper, we aim to provide a strategy for detecting communities using deep autoencoders and obtain generic neural attention to graphs. The advantages of neural attention are widely seen in the field of NLP and computer vision, which has low computational complexity for large-scale graphs. The contributions of the paper are summarized as follows. Firstly, a transformer is utilized to downsample the first-order proximities of the graph into a latent space, which can result in the structural properties and eventually assist in detecting the communities. Secondly, the fine-tuning task is conducted by tuning variant hyperparameters cautiously, which is applied to multiple social networks (Facebook and Twitch). Furthermore, the objective function (cross-entropy) is tuned by L0 regularization. Lastly, the reconstructed model forms communities that present the relationship between the groups. The proposed robust model provides good generalization and is applicable to obtaining not only the community structures in social networks but also the node classification. The proposed graph-transformer shows advanced performance on the social networks with the average NMIs of 0.67 ± 0.04, 0.198 0.02, 0.228 ± 0.02, and 0.68 ± 0.03 on Wikipedia crocodiles, Github Developers, Twitch England, and Facebook Page-Page networks, respectively.