Classical machine learning algorithms typically operate on unimodal data and hence it can analyze and make predictions based on data from a single source (modality). Whereas multimodal machine learning algorithm, learns from information across multiple modalities, such as text, images, audio, and sensor data. The paper leverages the functionalities of multimodal machine learning (ML) application for generating text from images. The proposed work presents an innovative multimodal algorithm that automates the creation of news articles from geo-tagged images by leveraging cutting-edge developments in machine learning, image captioning, and advanced text generation technologies. Employing a multimodal approach that integrates machine learning and transformer algorithms, such as visual geometry group network16 (VGGNet16), convolutional neural network (CNN) and a long short-term memory (LSTM) based system, the algorithm initiates by extracting the location from exchangeable image file format (Exif) data from the image. The features are extracted from the image and corresponding news headline is generated. The headlines are used for generating a comprehensive article with contemporary large language model (LLM). Further, the algorithm generates the news article big-science large open-science open-access multilingual language model (BLOOM). The algorithm was tested on real time photographs as well as images from the internet. In both the cases the news articles generated were validated with ROUGE and BULE score. The proposed work is found to be successful attempt in journalism field.