Intermodal travel is considered an effective method for achieving sustainable urban transportation. Understanding the factors influencing intermodal travel is crucial. Due to the relatively small proportion of intermodal trips within cities, datasets are significantly imbalanced, leading to poor performance of traditional logit models. In this paper, we develop a novel interpretable ensemble learning (IEL) model to identify key factors through voting by five types of machine learning models. We test our model on two datasets with different numbers of features. The results show that travel duration, travel distance, vehicle ownership, and distance to the nearest metro station are the key factors influencing intermodal travel, cumulatively contributing nearly 70% in the JDS2021 dataset with 14 features and nearly 80% in the SHS2019 dataset with 8 features. Furthermore, we analyze the interpretability of our model and compare it with the logit model. Our model enriches the methodology for modeling intermodal travel behavior.