Based on their powerful feature extraction capability, a convolutional neural network (CNN) has been gradually applied to gas identification in the electronic nose (e-nose) system. The responses of different intensities in the e-nose system are significantly correlated, and CNN extracts the local gas features by convolution while ignoring their global correlation. Transformer combines different responses and obtains the correlation between global features by self-attention. This paper proposes a lightweight hybrid network called Peak Search-based Convolutional Transformers (PSCFormer). First, combining the data characteristics of gas information, the Local Peak Search and Feature Fusion (LPSF) module is proposed to focus on the key gas features. Second, Transformer Encoder (TE) is proposed to obtain the global correlation between global features, and the parallel Convolution Encoder (CE) is proposed to capture the local dependence. Finally, a reasonable feature complementation mechanism is presented, and the preference of TE is alleviated for the slow-down response while solving the receptive field limitation of CE. This paper has evaluated three different datasets to validate the effectiveness of PSCFormer, all of which show stable and excellent performance with a good tradeoff between efficiency and complexity. The results prove that PSCFormer is an efficient and lightweight gas identification network, which provides a method to promote the engineering application of the e-nose system.