Abstract As a common life scenario, the use of voice control in elevator operation can effectively reduce public health risks and improve user experience in the post-epidemic era. However, due to the characteristics of its closure and public space, it is difficult to deploy the elevator. To solve this problem, an end-to-end speech command recognition algorithm based on a new deep neural network is proposed. The algorithm uses the self-built corpus for training, and uses the speech segment interception algorithm to obtain audio segments under the premise of speech stream as input, puts them into the network model in real time for reasoning, and drives the elevator to run according to the output results. The results show that compared with the recognition model using MFCC-CNN, the network has achieved about 10% improvement in accuracy under the premise of smaller computation, and compared with the DTW algorithm, the network has improved the accuracy by about 25%. Finally, a typical deployment environment is constructed to prove the correctness and effectiveness of the method in practical application.