Recently, owing to the capability of mobile and wearable devices to sense daily human activity, human activity recognition (HAR) datasets have become a large-scale data resource. Due to the heterogeneity and nonlinearly separable nature of the data recorded by these sensors, the datasets generated require special techniques to accurately predict human activity and mitigate the considerable heterogeneity. Consequently, classic clustering algorithms do not work well with these data. Hence, kernelization, which converts the data into a new feature vector representation, is performed on nonlinearly separable data. This study aims to present a robust method to perform HAR data clustering to mitigate heterogeneity in data with minimal resource consumption. Therefore, we propose a parallel approximated clustering approach to handle the computational cost of big data by addressing noise, heterogeneity, and nonlinearity in data using data reduction, filtering, and approximated clustering methods on parallel computing environments that have not been previously addressed. Our key contribution is to treat HAR as big data implemented by approximation kernel K-means approaches and fill the gap between the HAR clustering cost and parallel computing fields. We implemented our approach on Google cloud on a parallel spark cluster, which helped us to process large-scale HAR data across multiple machines of clusters. The normalized mutual information is used as validation metric to assess the quality of the clustering algorithm. Additionally, the precision, recall, f-score metrics values are obtained somehow to compare the results with a classification technique. The experimental results of our clustering approach prove its effectiveness compared with a classification technique and can efficiently detect physical activity and mitigate the heterogeneity of the datasets.