Deep learning (DL) networks, such as convolutional neural networks (CNNs) and long short-term memory (LSTM), have gained popularity for bearing fault diagnosis utilizing raw vibration signals. However, their accuracy and stability are compromised when facing imbalanced real-world datasets. This research investigates the impact of imbalanced datasets and explores the potential of signal processing techniques on network inputs compared to the direct use of raw vibration signals. The DL techniques studied include LSTM, one-dimensional CNN, and two-dimensional (2D) CNN, and a novel hybrid 2DCNNLSTM algorithm, incorporating signal processing methods such as Fourier transform and continuous wavelet transform while maintaining nearly equal parameters and the same base architecture. The proposed hybrid 2DCNNLSTM algorithm combines the strengths of LSTM and CNN, allowing for improved bearing diagnosis by capturing both spatial and temporal information in vibration signals. The proposed 2DCNNLSTM algorithm also considers multi-channel input augmenting raw vibration signal, mean, and variance channels to extract meaningful features and enhance classification efficiency. The publicly available Case Western Reserve University benchmark-bearing test rig dataset with ten fault classes, the Paderborn University dataset with three fault classes, and NASA Centre for Intelligent Maintenance Systems bearing datasets with five fault classes are utilized to test the proposed deep learning networks’ accuracy, effectiveness, robustness, and stability. The studies reveal that the hybrid 2DCNNLSTM-based networks outperform both CNN and LSTM networks, even without input processing. Further, utilizing multi-channel input by augmenting the 2D raw signal with mean and variance value channels proves to be more efficient in handling imbalanced and complex datasets while employing a 2DCNNLSTM-based network.