Advancements in technology have significantly changed how we interact on social media platforms, where reviews and comments heavily influence consumer decisions. Traditionally, opinion mining has focused on textual data, overlooking the valuable insights present in customer-uploaded images—a concept we term Multus-Medium. This paper introduces a multimodal strategy for product recommendations that utilizes both text and image data. The proposed approach involves data collection, preprocessing, and sentiment analysis using Vti for images and SpanBERT for text reviews. These outputs are then fused to generate a final recommendation. The proposed model demonstrates superior performance, achieving 91.55% accuracy on the Amazon dataset and 90.89% on the Kaggle dataset. These compelling findings underscore the potential of our approach, offering a comprehensive and precise method for opinion mining in the era of social media-driven product reviews, ultimately aiding consumers in making informed purchasing decisions.