Automated emotion recognition using physiological signals has gained significant attention in recent years due to its potential applications in human–computer interaction, healthcare, and psychology. Electrocardiogram (ECG) signals are widely used due to their non-invasiveness, high temporal resolution, and direct relationship with the autonomic nervous system. In this study, we propose a novel approach for ECG-based emotion recognition using time-series to image encoding techniques and texture-based features combined with machine learning algorithms and deep learning architectures. The ECG data used in this study were obtained from the Continuously Annotated Signals of Emotion (CASE) and Wearable Stress and Affect Detection (WESAD) datasets. We categorized emotional states based on valence and arousal annotations into four classes: High-Valence High-Arousal, High-Valence Low-Arousal, Low-Valence High-Arousal, and Low-Valence Low-Arousal. The ECGs were segmented into 5 and 7-window segments and transformed into 2D representations using Gramian Angular Summation Field, Markov Transition Field, Recurrence Plot (RP), and the triple-channel fusion of all these images. Total of 85 textural features based on the Gray-Level Co-occurrence Matrix (GLCM), Gray-Level Run Length Matrix, Zernike’s Moments, Hu’s Moments, Fractal Dimension Texture Analysis (FDTA), and First-Order Statistics were extracted. Classifiers, namely Random Forest, Support Vector Machine (SVM), eXtreme Gradient Boosting (XGB), 1D Convolutional Neural Network (CNN), and Multi-head Attention Network were considered to classify the emotional states. The performance of the classifiers varied depending on the time-series to image encoding technique, segmentation approaches, and the classifier employed. We achieved the highest Weighted F-measure (F-m) of 94.91% (RP + XGB) and 86.78% (RP + SVM) using the 7-window and 5-window approaches, respectively. Our proposed 1D CNN architecture achieved the highest classification metrics (F-m = 92.52%, Balanced accuracy = 92.0%, Recall = 91.96%, and Precision = 93.16%) with RP images in a 7-window approach. GLCM and FDTA features made significant contributions to the classification of emotional states. Overall, our results suggest that the proposed method holds promise for developing more accurate and efficient emotion recognition systems.