Deep spectral clustering is an advanced unsupervised deep learning method with widespread applications. However, when applied to large-scale datasets, it may confront problems such as an unstable training process, limited scalability, and diminished interpretability. To mitigate these limitations, a novel unsupervised deep learning algorithm named Attention Non-negative Spectral Clustering (ANSC) is proposed. Specifically, a network architecture called the Spectral Attention Network (SAN) is designed to serve as the backbone network for ANSC. This design aims to improve the stability of the training process and the model’s scalability. This attention-based network architecture includes a feature extraction network, an eigenvector mapping network, and an orthogonalization module. Furthermore, to enhance the interpretability of ANSC, the objective function incorporates a cluster indicator matrix by introducing non-negative constraints. Additionally, the ℓ2,1-norm is employed as regularization to approximate the solution of the objective function. An analysis of a lower bound for the network size of ANSC is conducted based on the VC dimension. Extensive experiments validate that ANSC outperforms state-of-the-art clustering methods in terms of clustering accuracy, normalized mutual information, and adjusted rand index.