Though driving automation promises to improve driving safety, drivers are still required to be ready to retake control in conditionally automated vehicles, which are defined by the Society of Automotive Engineers (SAE) as SAE L3 vehicles. Thus, drivers’ states can still affect driving safety in SAE L3 vehicles. Previous research found that a high cognitive load may impair drivers’ takeover performance. Thus, it is still necessary to estimate drivers’ cognitive load in SAE L3 vehicles. However, existing driver cognitive load estimation algorithms mostly focus on vehicles with a lower level of driving automation (e.g., SAE L0), which may not be relevant when estimating driver states in SAE L3 vehicles, given that drivers’ responsibilities are different, and several commonly used measures (e.g., driving performance) are unavailable when drivers are not continuously controlling the vehicle. Further, previous driver cognitive load estimation algorithms rarely considered the temporal information in the input features. Thus, we proposed a deep-learning algorithm to estimate driver cognitive load in SAE L3 vehicles, which integrated multiple physiological features (i.e., electrocardiogram, electrodermal activity, respiration) and considered the temporal correlation of the data using a transformer-encoder-based network. The performance of our algorithm was compared with baseline models on an open data set. Results showed that our algorithm outperformed baseline models and achieved an accuracy of 94.4% using within-subject data partition (proportionally splitting data from the same subject into the training and testing data sets) and an accuracy of 89% using across-subjects data partition (dividing the training and testing data sets based on individual subjects).