Multiple exposure image fusion is a technique used to obtain high dynamic range images. Due to its low cost and high efficiency, it has received a lot of attention from researchers in recent years. Currently, most deep learning-based multiple exposure image fusion methods extract features from different exposure images using a single feature extraction method. Some methods simply rely on two different modules to directly extract features. However, this approach inevitably leads to the loss of some feature information during the feature extraction process, thus further affecting the performance of the model. To minimize the loss of feature information as much as possible, we propose an ultra-high-definition (UHD) multiple exposure image fusion method based on multi-scale feature extraction. The method adopts a U-shaped structure to construct the overall network model, which can fully exploit the feature information at different levels. Additionally, we construct a novel hybrid stacking paradigm to combine convolutional neural networks and Transformer modules. This combined module can extract both local texture features and global color features simultaneously. To more efficiently fuse and extract features, we also design a cross-layer feature fusion module, which can adaptively learn the correlation between features at different layers. Numerous quantitative and qualitative results demonstrate that our proposed method performs well in UHD multiple exposure image fusion.