To enhance the safety of the transition from autonomous to manual driving control, one of the key factors is quantifying the safe takeover transition, namely the takeover time. Previous studies have focused on factors influencing takeover time and predicting takeover time using traditional methods, but there is a lack of research utilizing deep learning networks and driver physiological factors for takeover time prediction. Toward this end, we propose a hybrid deep learning network model dedicated to predicting drivers’ takeover time with a dataset obtained from a bench experiment involving 46 drivers. Firstly, to address the potential information loss commonly associated with traditional manual feature extraction techniques, a convolutional neural network (CNN) is developed to automatically extract physiological spatial representations from drivers. Additionally, a double-layer bidirectional long short-term memory network (DBiLSTM) is incorporated to model the physiological time representation dimension, enabling a more refined representation of physiological signals. Then, to tackle the complexities introduced by individual driver variations, a self-attention mechanism is enabling the model to automatically prioritize critical features, thereby enhancing the accuracy and reliability of its predictions. The results demonstrate that the proposed approach surpasses traditional machine learning network models in predictive accuracy for takeover time prediction, thereby exhibiting novelty in the field of vehicle takeover control research. This provides a more reliable technical foundation for anticipating the durations of vehicle takeovers.