The deep learning community has increasingly focused on the critical challenges of human activity segmentation and detection based on sensors, which have numerous real-world applications. In most prior efforts, activity segmentation and recognition have been treated as separate processes, relying on pre-segmented sensor streams. This research proposes an unsupervised deep learning approach for Human Activity Recognition (HAR) that is segment-based, with an emphasis on activity continuity. The approach integrates segment-based SimCLR with Segment Feature Decorrelation (SDFD) and a new framework that leverages pairs of segment data for contrastive learning of visual representations. Furthermore, the Secretary Bird Optimization Algorithm (SBOA) and Channel Attention with Spatial Attention Network (CASANet) are utilized to enhance the performance of sensor-based human activity detection. CASANet effectively extracts key features and spatial dependencies in sensor data, while SBOA optimizes the model for greater accuracy and generalization. Evaluations on two publicly available datasets—Mhealth and PAMAP2—demonstrated an average F1 score of 98%, highlighting the approach’s efficacy in improving activity recognition performance.