Crop-related problems such as pests and diseases in India lead to yearly losses exceeding $500 billion. Leaf blight is identified as the principal factor responsible for the substantial financial losses amounting to $500 billion. Farmers engaged in the cultivation of forage and grain sorghum experience the greatest degree of hardship. This disease has a significant impact on various crops, including maize, rice, tomato, potato, millet, and onion. The timely detection and evaluation of disease in plants can contribute to mitigating the extent of associated losses. However, the task presents difficulties as a result of variations in crop species, varieteis of crop diseases, and environmental factors. The current methodologies lack generalizability in their ability to classify and predict diseases. All of the techniques employed in this study are applied to a dataset with predetermined input values and corresponding output values. The current methodologies involve preprocessing the images and performing segmentation for extracting the appropriate characteristics. The process of segmentation necessitates the implementation of pre-processing techniques, such as dilation and edge detection. As a consequence, the loss of crucial information occurs, which subsequently leads to inaccurate classification. Furthermore, the methodologies employed thus far have not been designed to evaluate the performance of the algorithm on specialised or specific datasets. Deep learning methodologies are susceptible to the issue of overfitting. This paper proposed an approach for extracting and analysing crop image data using the PySpark (MCIP) data frame. The MCIP framework employs Principal Component Analysis (PCA) as a method for selecting pertinent features. The PCA features that have been gathered are subsequently employed to identify homogeneous subgroups through the utilisation of the K-means algorithm. The utilisation of a categorised predictive output facilitates the identification and detection of diseases present in potato leaves. The utilisation of the Multispectral Crop Imaging Platform (MCIP) extends beyond the examination of potatoes exclusively, as it possesses the capability to identify diseases present in the foliage of various agricultural crops. In order to validate our assertion, we conducted an experiment utilising the MCIP algorithm on a dataset pertaining to rice diseases. In order to assess the robustness of MCIP, we conducted an evaluation of its Accuracy, Silhouette score, speed, and F1 score. The MCIP model demonstrated high performance in terms of both speed and accuracy compared to existed approaches. The level of accuracy is remarkably near 100 percent.
 
 Index Terms— Agriculture, clustering, data mining, k-means, pca, pySpark