Currently, hidden Markov-based multi-step attack detection models are mainly trained using the unsupervised Baum–Welch algorithm. The Baum–Welch algorithm is sensitive to the initial values of model parameters. However, its training uses random or average parameter initialization methods, which frequently results in the model training into a local optimum, thus, making the model unable to fit the alert logs well and thereby reducing the detection effectiveness of the model. To solve this issue, we propose a pre-training method for multi-step attack detection models based on the high semantic similarity of alerts in the same attack phase. The method first clusters the alerts based on their semantic information and pre-classifies the attack phase to which each alert belongs. Then, the distance of the alert vector to each attack stage is converted into the probability of generating alerts in each attack stage, replacing the initial value of Baum–Welch. The effectiveness of the proposed method is evaluated using the DARPA 2000 dataset, DEFCON21 CTF dataset, and ISCXIDS 2012 dataset. The experimental results show that the hidden Markov multi-step attack detection method based on pre-training of the proposed model parameters had higher detection accuracy than the Baum–Welch-based, K-means-based, and transfer learning differential evolution-based hidden Markov multi-step attack detection methods.