Real-world applications frequently involve multiple data modalities in the same samples, which are regarded as multi-view data. Multi-view clustering has been studied extensively in recent years to demonstrate embedded heterogeneity. However, most existing methods emphasize low-order correlation in multiple views, whereas approaches that incorporate high-order correlation are limited by the equal view-specific significance problem or a trade-off between global and local consistency. In this paper, we propose a co-regularized optimal graph-based clustering method known as Co-MSE, which integrates the correlation of different orders. By integrating the first-order and second-order similarities, the local structure is preserved, while an optimized embedding representation for multi-view data is obtained simultaneously through co-regularization. We demonstrate that Co-MSE can aid in providing a more suitable embedding representation and further enable satisfactory clustering performance. Extensive experiments on real-world datasets confirm the effectiveness and advantages of the proposed method.