Introduction: Content Based Image Retrieval (CBIR) system is an innovative technology to retrieve images from various media types. One of the CBIR applications is Content Based Medical Image Retrieval (CBMIR). The image retrieval system retrieves the most similar images from the historical cases, and such systems can only support the physician's decision to diagnose a disease. To extract the useful features from the query image for linking similar types of images is the major challenge in the CBIR domain. The Convolution Neural Network (CNN) can overcome the drawbacks of traditional algorithms, dependent on the low-level feature extraction technique. Objective: The objective of the study is to develop a CNN model with a minimum number of convolution layers and to get the maximum possible accuracy for the CBMIR system. The minimum number of convolution layers reduces the number of mathematical operations and the time for the model's training. It also reduces the number of training parameters, like weights and bias. Thus, it reduces the memory requirement for the model storage. This work mainly focused on developing an optimized CNN model for the CBMIR system. Such systems can only support the physicians' decision to diagnose a disease from the images and retrieve the relevant cases to help the doctor decide the precise treatment. Methods: The deep learning-based model is proposed in this paper. The experiment is done with several convolution layers and various optimizers to get the maximum accuracy with a minimum number of convolution layers. Thus, the ten-layer CNN model is developed from scratch and used to derive the training and testing images' features and classify the test image. Once the image class is identified, the most relevant images are determined based on the Euclidean distance between the query features and database features of the identified class. Based on this distance, the most relevant images are displayed from the respective class of images. The general dataset CIFAR10, which has 60,000 images of 10 different classes, and the medical dataset IRMA, which has 2508 images of 9 various classes, have been used to analyze the proposed method. The proposed model is also applied for the medical x-ray image dataset of chest disease and compared with the other pre-trained models. Results: The accuracy and the average precision rate are the measurement parameters utilized to compare the proposed model with different machine learning techniques. The accuracy of the proposed model for the CIFAR10 dataset is 93.9%, which is better than the state-of-the-art methods. After the success for the general dataset, the model is also tested for the medical dataset. For the x-ray images of the IRMA dataset, it is 86.53%, which is better than the different pre-trained model results. The model is also tested for the other x-ray dataset, which is utilized to identify chest-related disease. The average precision rate for such a dataset is 97.25%. Also, the proposed model fulfills the major challenge of the semantic gap. The semantic gap of the proposed model for the chest disease dataset is 2.75%, and for the IRMA dataset, it is 13.47%. Also, only ten convolution layers are utilized in the proposed model, which is very small in number compared to the other pre-trained models. Conclusion: The proposed technique shows remarkable improvement in performance metrics over CNN-based state-of-the-art methods. It also offers a significant improvement in performance metrics over different pre-trained models for the two different medical x-ray image datasets.