Abstract In the field of single-channel speech separation, the extraction and separation of features from mixed audio have always been the focus and difficulty of research. Currently, mainstream methods mainly suffer from poor generalization ability and issues such as inadequate feature extraction, which leads to the models’ inferior separation capability. This paper proposes an improved DConv-TasNet network model, focusing on the optimization of the encoder/decoder modules and separation modules and utilizing deep dilated encoders/decoders to extract features from mixed speech signals. It enhances feature extraction capability and generalization compared to conventional encoders/decoders. In terms of the separation module, improvements were made to the convolutional blocks within the module by enhancing feature extraction in the channel dimension, leading to improved performance of the separation network. Validation of the model’s performance was conducted using the WSJ0-Mix2 dataset, demonstrating superior performance compared to the Conv-TasNet network.