Graph coloring has a wide range of real world applications, such as in the operations research, communication network, computational biology and compiler optimization fields. In our recent work [1], we propose a divide-and-conquer approach for graph coloring, called VColor. Such an approach has three generic subroutines. (i) Graph partition subroutine: VColor partitions a graph G into a vertex cut partition (VP), which comprises a vertex cut component (VCC) and small non-overlapping connected components (CCs). (ii) Component coloring subroutine: VColor colors the VCC and the CCs by efficient algorithms. (iii) Color combination subroutine: VColor combines the local colors by exploiting the maximum matchings of color combination bigraphs (CCBs). VColor has revealed some major bottlenecks of efficiency in these subroutines. Therefore, in this paper, we propose VColor*, an approach which addresses these efficiency bottlenecks without using more colors both theoretically and experimentally. The technical novelties of this paper are the following. (i) We propose the augmented VP to index the crossing edges of the VCC and the CCs and propose an optimized CCB construction algorithm. (ii) For sparse CCs, we propose using a greedy coloring algorithm that is of polynomial time complexity in the worst case, while preserving the approximation ratio. (iii) We propose a distributed graph coloring algorithm. Our extensive experimental evaluation on real-world graphs confirms the efficiency of VColor*. In particular, VColor* is 20X and 50X faster than VColor and uses the same number of colors with VColor on the Pokec and PA datasets, respectively. VColor* also significantly outperforms the state-of-the-art graph coloring methods.