Background: The quality of water directly or indirectly impacts the health and environmental well-being. Data about water quality can be evaluated using a Water Quality Index (WQI). Computing WQI is a quick and affordable technique to accurately summarise the quality of water. background: Find strategies for data preparation to categorize a dataset on the water quality in two remote Indian villages Objective: The objective of this study is to find strategies for data preparation to categorize a dataset on the water quality in two remote Indian villages in different geographic locations, to predict the quality of water, and to identify low-quality water before it is made accessible for human consumption. Methods: To accomplish this task, four water quality features Nitrate, pH, Residual Chlorine, and Total Dissolved Solids which are crucial for human consumption, are considered to dictate the quality of water. Methods used in handling these features include five steps that are data preprocessing with min-max normalization, finding WQI, using feature correlation to identify parameter importance with WQI, application of supervised machine learning regression models such as Random Forest (RF), Multiple Linear Regression (MLR), Gradient Boosting (GB) and Support Vector Machine (SVM) for WQI prediction. Then, a variety of machine learning classification models, including K-Nearest Neighbour (KNN), Support Vector Classifier (SVC), and Multi-layer Perceptron (MLP), are ensembled with Logistic Regression (LR), acting as a meta learner, to create a stack ensemble model classifier to predict the Water Quality Class (WQC) more accurately. Results: The examination of the testing model revealed that RF regression and MLR algorithms performed best in predicting the WQI with mean absolute error (MAE) of 0.003 and 0.001 respectively. Mean square error (MSE), root mean square error (RMSE), R squared (R2), and Explained Variance Score (EVS) findings are 0.002,0.005,0.988 and 0.998 respectively with RF while 0.001,0.031,0.999 and 0.999 respectively with MLR. Meanwhile, for predicting WQC, the stack model classifier showed the best performance with an Accuracy of 0.936, F1 score of 0.93, and Matthews Correlation Coefficient (MCC) of 0.893 for the dataset of Lalpura and Accuracy of 0.991, F1 Score of 0.991 and MCC of 0.981 respectively for the dataset of Heingang. Conclusion: This study explores a method for predicting water quality that combines easy and feasible water quality measurements with machine learning. The stack model classifier performed best for multiclass classification, according to this study. To ensure that the highest quality of water is given throughout the year, information from this study will motivate researchers to look into the underlying root causes of the quality variations.