ABSTRACT With the increasing frequency of floods, in-depth flood event analyses are essential for effective disaster relief and prevention. Satellite-based flood event datasets have become the primary data source for flood event analyses instead of limited disaster maps due to their enhanced availability. Nevertheless, despite the vast amount of available remote sensing images, existing flood event datasets continue to pose significant challenges in flood event analyses due to the uneven geographical distribution of data, the scarcity of time series data, and the limited availability of flood-related semantic information. There has been a surge in acceptance of deep learning models for flood event analyses, but some existing flood datasets do not align well with model training, and distinguishing flooded areas has proven difficult with limited data modalities and semantic information. Moreover, efficient retrieval and pre-screening of flood-related imagery from vast satellite data impose notable obstacles, particularly within large-scale analyses. To address these issues, we propose a Multimodal Flood Event Dataset (MFED) for deep-learning-based flood event analyses and data retrieval. It consists of 18 years of multi-source remote sensing imagery and heterogeneous textual information covering flood-prone areas worldwide. Incorporating optical and radar imagery can exploit the correlation and complementarity between distinct image modalities to capture the pixel features in flood imagery. It is worth noting that text modality data, including auxiliary hydrological information extracted from the Global Flood Database and text information refined from online news records, can also offer a semantic supplement to the images for flood event retrieval and analysis. To verify the applicability of the MFED in deep learning models, we carried out experiments with different models using a single modality and different combinations of modalities, which fully verified the effectiveness of the dataset. Furthermore, we also verify the efficiency of the MFED in comparative experiments with existing multimodal datasets and diverse neural network structures.