This paper proposes an unsupervised color–texture image segmentation method. In order to enhance the effects of segmentation, a new color–texture descriptor is designed by integrating the compact multi-scale structure tensor (MSST), total variation (TV) flow, and the color information. Due to the fact that MSST does not work well for separating regions with large-scale texture, the total variation flow is used to auxiliarily describe the texture feature by extracting local scale information. To segment the color–texture image in an unsupervised and multi-label way, the multivariate mixed student's t-distribution (MMST) is chosen for probability distribution modeling, as MMST can describe the distribution of color–texture features accurately. Since the valid class number is hard to adaptively determine in advance, a component-wise expectation–maximization for MMST (CEM3ST) algorithm is proposed, which can effectively initialize the valid class number. Then, we can build up the energy functional according to the valid class number, and optimize it by multilayer graph cuts method. However, the problem of over/error-segmentation often happens. To overcome this problem, a strategy of regional credibility merging (RCM) is presented by integrating the regional adjacency relationship, region size, common edge between regions, and regional color–texture dissimilarity. In order to terminate the whole segmentation process, an adaptive iteration convergence criterion is designed, which combines the negative logarithm of probability of all color–texture features with the Kullback–Leibler (KL) divergence for MMST. Experiments using a large number of synthesis color–texture images and real natural scene images demonstrate the superiority of our proposed method, such as the effective over/error-segmentation reduction, high segmentation accuracy, and outperforming visual entirety/consistency.