Abstract The design of acoustic models is of vital importance to build a reliable connection between acoustic waveform and linguistic messages in terms of individual speech units. According to the characteristic of Chinese phonemes, the base acoustic phoneme units set is decided and refined and a decision tree based state tying approach is explored. Since one of the advantages of top-down tying method is flexibility in maintaining a balance between model accuracy and complexity, relevant adjustments are conducted, such as the stopping criterion of decision tree node splitting, during which optimal thresholds are captured. Better results are achieved in improving acoustic modeling accuracy as well as minimizing the scale of the model to a trainable extent.