In times of crisis, the prompt and precise classification of disaster-related information shared on social media platforms is of paramount importance for effective disaster response and public safety. During such critical events, people utilize social media as a medium for communication, sharing multimodal textual and visual content. However, due to the substantial influx of unfiltered and diverse data, humanitarian organizations face challenges in effectively leveraging this information. Numerous methods have been proposed for classifying disaster-related content, but these methods lack modeling users’ credibility, emotional context, and social interaction information, which is crucial for classification. In this context, we propose a method, CrisisSpot, that leverages a Graph-based Neural Network to comprehend intricate relationships between textual and visual modalities and Social Context Features to incorporate user-centric and content-centric information. We also propose Inverted Dual Embedded Attention, which captures both harmonious and contrary patterns present in the data to harness complex interactions and facilitate richer insights in multimodal data. We have developed a multimodal disaster dataset, TSEqD (Turkey-Syria Earthquake Dataset), which is a large annotated dataset for a single disaster event containing 10,352 data samples. Through extensive experimentation, CrisisSpot has demonstrated significant improvements, achieving an average gain of 9.45% and 5.01% in F1-score compared to the state-of-the-art methods on the publicly available CrisisMMD dataset and TSEqD dataset, respectively.