PurposeThe abnormal behaviors of staff at petroleum stations pose significant safety hazards. Addressing the challenges of high parameter counts, lengthy training periods and low recognition rates in existing 3D ResNet behavior recognition models, this paper proposes GTB-ResNet, a network designed to detect abnormal behaviors in petroleum station staff.Design/methodology/approachFirstly, to mitigate the issues of excessive parameters and computational complexity in 3D ResNet, a lightweight residual convolution module called the Ghost residual module (GhostNet) is introduced in the feature extraction network. Ghost convolution replaces standard convolution, reducing model parameters while preserving multi-scale feature extraction capabilities. Secondly, to enhance the model's focus on salient features amidst wide surveillance ranges and small target objects, the triplet attention mechanism module is integrated to facilitate spatial and channel information interaction. Lastly, to address the challenge of short time-series features leading to misjudgments in similar actions, a bidirectional gated recurrent network is added to the feature extraction backbone network. This ensures the extraction of key long time-series features, thereby improving feature extraction accuracy.FindingsThe experimental setup encompasses four behavior types: illegal phone answering, smoking, falling (abnormal) and touching the face (normal), comprising a total of 892 videos. Experimental results showcase GTB-ResNet achieving a recognition accuracy of 96.7% with a model parameter count of 4.46 M and a computational complexity of 3.898 G. This represents a 4.4% improvement over 3D ResNet, with reductions of 90.4% in parameters and 61.5% in computational complexity.Originality/valueSpecifically designed for edge devices in oil stations, the 3D ResNet network is tailored for real-time action prediction. To address the challenges posed by the large number of parameters in 3D ResNet networks and the difficulties in deployment on edge devices, a lightweight residual module based on ghost convolution is developed. Additionally, to tackle the issue of low detection accuracy of behaviors amidst the noisy environment of petroleum stations, a triple attention mechanism is introduced during feature extraction to enhance focus on salient features. Moreover, to overcome the potential for misjudgments arising from the similarity of actions, a Bi-GRU model is introduced to enhance the extraction of key long-term features.