ABSTRACT Representation learning techniques have been frequently applied in multimedia content analysis and retrieval. In this study, an efficient multimedia data clustering method is presented, which consists of two independent algorithms. First, we propose a new representation framework by incorporating sparse coding and manifold regularisation in an optimisation objective function, the cluster indicator matrix is estimated by introducing sparsity norm coarsely. Second, we refine the estimated cluster indicator matrix by performing spectral rotation such that an optimal assignment for clustering can be learned. Compared with existing methods, we have the following merits: our method takes into account the global matrix reconstruction information and locality manifold information simultaneously. Therefore, global and locality information both are respected. Additionally, theoretical justification about the novel representation method is presented in this study. Comprehensive experiments demonstrate the effectiveness and efficiency of our method in comparison with the state-of-the-art clustering methods on six real-world image datasets.