ObjectiveTraditional methods face challenges in accurately analyzing fetal heart rate (FHR) signals due to the complexity of accelerations and decelerations (Acc/Dec) and their cyclic definition relationship with baseline. We aim to develop a deep learning model, Ensemble Transformer-Convolutional Neural Network (ETCNN), to improve baseline/Acc/Dec determination accuracy and validate its generalization across multi-center and multi-device test datasets. MethodsWe proposed ETCNN as a solution, treating FHR analysis as a one-dimensional signal segmentation problem. ETCNN consists of four subnetworks (TCNNs), each equipped with convolutional kernel size of 21, 31, 61, and 81, respectively. Each subnetwork integrates Channel-Residual (C-Res) modules and Channel Cross fusion with Transformer (CCT) modules. C-Res modules dynamically prune irrelevant channels, focusing on critical FHR episodes, while CCT modules harness multi-scale features to narrow semantic gaps. ResultsTrained on Lille Catholic University’s open-access database (LCU-DB), ETCNN’s performance surpassed twelve traditional methods and three deep learning models across four independent multi-center and multi-device test datasets. Ablation experiments demonstrated the effectiveness of ensemble learning, multi-scale convolution, residual channel attention, channel cross fusion attention, and multi-head attention in improving performance. Conclusion and significanceETCNN shows promise for accurate and efficient FHR analysis, with successful generalization across various datasets. Its advancements hold potential for clinical applications in fetal monitoring.