In the last few years, Transformer has revolutionized the area of medical image segmentation. Several similar studies have used the UNet architecture to combine convolutional neural networks with transformers. However, these approaches fail to account for the speed at which segmentation occurs and the ability to extract features within the Transformer. They fail to consider the fact that changing the shape of the feature maps in a subtle way can be used for rapid extraction of local and global information. To solve the above problems, CTBANet (Convolutional Transformer and Bidirectional Attention Based for Medical Image Segmentation) is proposed, which has two prominent components, CTblock (Convolutional Combined Transformer module) and BAblock (Bidirectional Attentionblock). CTblock integrates the strengths of CNNs and Transformers, enabling it to extract spatial details and global data. In order to improve the speed and accuracy of the model, multi-scale pyramid pooling is embedded into PAM, named APAM (Asymmetric PAM), and strip convolution is embedded into CAM, named ACAM (Asymmetric CAM). Medical image segmentation is a critical issue in the medical field, and the experimental results of the benchmarks show that our model is obviously more accurate and faster than the other methods in segmenting medical images.