Multi-view clustering, which exploits the multi-view information to partition data into their clusters, has attracted intense attention. However, most existing methods directly learn a similarity graph from original multi-view features, which inevitably contain noises and redundancy information. The learned similarity graph is inaccurate and is insufficient to depict the underlying cluster structure of multi-view data. To address this issue, we propose a novel multi-view clustering method that is able to construct an essential similarity graph in a spectral embedding space instead of the original feature space. Concretely, we first obtain multiple spectral embedding matrices from the view-specific similarity graphs, and reorganize the gram matrices constructed by the inner product of the normalized spectral embedding matrices into a tensor. Then, we impose a weighted tensor nuclear norm constraint on the tensor to capture high-order consistent information among multiple views. Furthermore, we unify the spectral embedding and low rank tensor learning into a unified optimization framework to determine the spectral embedding matrices and tensor representation jointly. Finally, we obtain the consensus similarity graph from the gram matrices via an adaptive neighbor manner. An efficient optimization algorithm is designed to solve the resultant optimization problem. Extensive experiments on six benchmark datasets are conducted to verify the efficacy of the proposed method. The code is implemented by using MATLAB R2018a and MindSpore library <xref ref-type="bibr" rid="ref1">[1]</xref>: <uri>https://github.com/guanyuezhen/CGL</uri>.