Water pollution has become a major concern in recent years, affecting over 2 billion people worldwide, according to UNESCO. This pollution can occur by either naturally, such as algal blooms, or man-made when toxic substances are released into water bodies like lakes, rivers, springs, and oceans. To address this issue and monitor surface-level water pollution in local water bodies, an informative real-time vision-based surveillance system has been developed in conjunction with large language models (LLMs). This system has an integrated camera connected to a Raspberry Pi for processing input frames and is further linked to LLMs for generating contextual information regarding the type, causes, and impact of pollutants on both human health and the environment. This multi-model setup enables local authorities to monitor water pollution and take necessary steps to mitigate it. To train the vision model, seven major types of pollutants found in water bodies like algal bloom, synthetic foams, dead fishes, oil spills, wooden logs, industrial waste run-offs, and trashes were used for achieving accurate detection. ChatGPT API has been integrated with the model to generate contextual information about pollution detected. Thus, the multi-model system can conduct surveillance over water bodies and autonomously alert local authorities to take immediate action, eliminating the need for human intervention. PRACTITIONER POINTS: Combines cameras and LLMs with Raspberry Pi for processing and generating pollutant information. Uses YOLOv5 to detect algal blooms, synthetic foams, dead fish, oil spills, and industrial waste. Supports various modules and environments, including drones and mobile apps for broad monitoring. Educates on environmental healthand alerts authorities about waterpollution.