The prediction and modeling of stock price movements have been shown to possess considerable economic significance within the finance sector. Recently, a range of artificial intelligence methodologies, encompassing both traditional machine learning and deep learning approaches, have been introduced for the purpose of forecasting stock price fluctuations, yielding numerous successful outcomes. Nonetheless, the identification of effective features for predicting stock movements is considered a complex challenge, primarily due to the non-linear characteristics, volatility, and inherent noise present in financial data. This study introduces an innovative Deep Convolutional Transformer (DCT) model that amalgamates convolutional neural networks, Transformers, and a multi-head attention mechanism. It features an inception convolutional token embedding architecture alongside separable fully connected layers. Experiments conducted on the NASDAQ, Hang Seng Index (HSI), and Shanghai Stock Exchange Composite (SSEC) employ Mean Absolute Error (MAE), Mean Square Error (MSE), Mean Absolute Percentage Error (MAPE), accuracy, and Matthews Correlation Coefficient (MCC) as evaluation metrics. The findings reveal that the DCT model achieves the highest accuracy of 58.85% on the NASDAQ dataset with a sliding window width of 30 days. In terms of error metrics, it surpasses other models, demonstrating the lowest average prediction error across all datasets for MAE, MSE, and MAPE. Furthermore, the DCT model attains the highest MCC values across all three datasets. These results suggest a promising capability for classifying stock price trends and affirming the DCT model’s superiority in predicting closing prices.