Deep learning models have gained prominence in human activity recognition using ambient sensors, particularly for telemonitoring older adults' daily activities in real-world scenarios. However, collecting large volumes of annotated sensor data presents a formidable challenge, given the time-consuming and costly nature of traditional manual annotation methods, especially for extensive projects. In response to this challenge, we propose a novel AttCLHAR model rooted in the self-supervised learning framework SimCLR and augmented with a self-attention mechanism. This model is designed for human activity recognition utilizing ambient sensor data, tailored explicitly for scenarios with limited or no annotations. AttCLHAR encompasses unsupervised pre-training and fine-tuning phases, sharing a common encoder module with two convolutional layers and a long short-term memory (LSTM) layer. The output is further connected to a self-attention layer, allowing the model to selectively focus on different input sequence segments. The incorporation of sharpness-aware minimization (SAM) aims to enhance model generalization by penalizing loss sharpness. The pre-training phase focuses on learning representative features from abundant unlabeled data, capturing both spatial and temporal dependencies in the sensor data. It facilitates the extraction of informative features for subsequent fine-tuning tasks. We extensively evaluated the AttCLHAR model using three CASAS smart home datasets (Aruba-1, Aruba-2, and Milan). We compared its performance against the SimCLR framework, SimCLR with SAM, and SimCLR with the self-attention layer. The experimental results demonstrate the superior performance of our approach, especially in semi-supervised and transfer learning scenarios. It outperforms existing models, marking a significant advancement in using self-supervised learning to extract valuable insights from unlabeled ambient sensor data in real-world environments.