Coronal mass ejections (CMEs) are among the most intense phenomena in the Sun–Earth system, often resulting in space environment effects and consequential geomagnetic disturbances. Consequently, quickly and accurately predicting CME arrival time is crucial to minimize the harm caused to the near-Earth space environment. To forecast the arrival time of CMEs, researchers have developed diverse methods over the years. While existing approaches have yielded positive results, they do not fully use the available data, as they solely accept either CME physical parameters or CME images as inputs. To solve this issue, we propose a method that extracts features from both CME physical parameters and CME images and uses the attention mechanism to fuse the two types of data. First, we design a parameter feature extraction module that extracts features from CME physical parameters. After that, we adopt an effective convolutional neural network model as our image feature extraction module for extracting features from CME images. Finally, utilizing the attention mechanism, we present a feature fusion module designed to fuse the features extracted from both parameters and images of CMEs. Therefore, our model can fully utilize and combine physical parameters and image features, which allows it to capture significant and comprehensive information about CMEs.