Human Activity Recognition (HAR) can be defined as the automatic prediction of the regular human activities performed in our day-to-day life, such as walking, running, cooking, performing office work, etc. It is truly beneficial in the field of medical care services, for example, personal health care assistants, old-age care services, maintaining patient records for future help, etc. Input data to a HAR system can be (a) videos or still images capturing human activities, or (b) time-series data of human body movements while performing the activities taken from sensors in the smart devices like accelerometer, gyroscope, etc. In this work, we mainly focus on the second category of the input data. Here, we propose an ensemble of three classification models, namely CNN-Net, Encoded-Net, and CNN-LSTM, which is named as EnsemConvNet. Each of these classification models is built upon simple 1D Convolutional Neural Network (CNN) but differs in terms of the number of dense layers, kernel size used along with other key differences in the architecture. Each model accepts the time series data as a 2D matrix by taking a window of data at a time in order to infer information, which ultimately predicts the type of human activity. Classification outcome of the EnsemConvNet model is decided using various classifier combination methods that include majority voting, sum rule, product rule, and a score fusion approach called adaptive weighted approach. Three benchmark datasets, namely WISDM activity prediction, UniMiB SHAR, MobiAct, are used for evaluating our proposed model. We have compared our EnsemConvNet model with some existing deep learning models such as Multi Headed CNN, hybrid of CNN, and Long Short Term Memory (LSTM) models. The results obtained here establish the supremacy of the EnsemConvNet model over the other mentioned models.