Human Activity Recognition (HAR) is the task to automatically analyze and recognize human body gestures or actions. HAR using time-series multi-modal sensory data is a challenging and important task in the field of machine learning and feature engineering due to its increasing demands in numerous real-world applications such as healthcare, sports and surveillance. Numerous daily wearable devices e.g., smartphones, smartwatches, and smart glasses can be used to collect and analyze the human activities on an unprecedented scale. This paper presents a generic framework to recognize the different human activities using continuous time-series multimodal sensory data of these smart gadgets. The proposed framework follows the channel of Bag-of-Features which consists of four steps: (i) Data acquisition and pre-processing, (ii) codebook computation, (iii) feature encoding, and (iv) classification. Each step in the framework plays a significant role to generate an appropriate feature representation of raw sensory data for efficient activity recognition. In the first step, we employed a simple overlapped-window sampling approach to segment the continuous time-series sensory data to make it suitable for activity recognition. Secondly, we build a codebook using k-means clustering algorithm to group the similar sub-sequences. The center of each group is known as codeword and we assume that it represents a specific movement in the activity sequence. The third step consists of feature encoding which transform the raw sensory data of activity sequence into its respective high-level representation for the classification. Specifically, we presented three reconstruction-based encoding techniques to encode sensory data, namely: Sparse Coding, Local Coordinate Coding, and Locality-constrained Linear Coding. The segmented activity sub-sequences are transformed to high-level representation using these techniques and earlier computed codebook. Finally, the encoded features are classified using a simple Random Forest classifier. The proposed HAR framework using three different encoding techniques is evaluated on three large benchmark datasets: UniMiB-SHAR dataset, MHEALTH dataset and WISDM dataset and their results are compared with most recent state-of-the-art techniques. It outperforms the existing techniques and achieves 98.39%, 99.5%, and 99.4% recognition scores on UniMiB-SHAR dataset, MHEALTH dataset, and WISDM dataset respectively. The excellent recognition results and computational analysis confirms the effectiveness and efficiency of the proposed framework.