In the Motor Imagery (MI)-based Brain Computer Interface (BCI), users' intention is converted into a control signal through processing a specific pattern in brain signals reflecting motor characteristics. There are such restrictions as the limited size of the existing datasets and low signal to noise ratio in the classification of MI Electroencephalogram (EEG) signals. Machine learning (ML) methods, particularly Deep Learning (DL), have overcome these limitations relatively. In this study, three hybrid models were proposed to classify the EEG signal in the MI-based BCI. The proposed hybrid models consist of the convolutional neural networks (CNN) and the Long-Short Term Memory (LSTM). In the first model, the CNN with different number of convolutional-pooling blocks (from shallow to deep CNN) was examined; a two-block CNN model not affected by the vanishing gradient descent and yet able to extract desirable features employed; the second and third models contained pre-trained CNNs conducing to the exploration of more complex features. The transfer learning strategy and data augmentation methods were applied to overcome the limited size of the datasets by transferring learning from one model to another. This was achieved by employing two powerful pre-trained convolutional neural networks namely ResNet-50 and Inception-v3. The continuous wavelet transform (CWT) was used to generate images for the CNN. The performance of the proposed models was evaluated on the BCI Competition IV dataset 2a. The mean accuracy vlaues of 86%, 90%, and 92%, and mean Kappa values of 81%, 86%, and 88% were obtained for the hybrid neural network with the customized CNN, the hybrid neural network with ResNet-50 and the hybrid neural network with Inception-v3, respectively. Despite the promising performance of the three proposed models, the hybrid neural network with Inception-v3 outperformed the two other models. The best obtained result in the present study improved the previous best result in the literature by 7% in terms of classification accuracy. From the findings, it can be concluded that transfer learning based on a pre-trained CNN in combination with LSTM is a novel method in MI-based BCI. The study also has implications for the discrimination of motor imagery tasks in each EEG recording channel and in different brain regions which can reduce computational time in future works by only selecting the most effective channels.