Similarity computation between images or image regions is a necessary precursor for several vision-based applications, such as retrieval, registration, change detection etc. A two-channel convolutional neural network architecture is designed to retrieve an appropriate visual (VS) image representative from the repository, given as query a long-wave infrared (LWIR) image patch of the same region. Both the VS and LWIR image regions are described using pretrained convolution neural network models and images are ranked by computing the dis/similarity between the feature vectors. It is essential to evaluate and identify the suitable combination of feature descriptor and distance measure, when applied to LWIR and visual image similarity, as pre-trained CNNs such as VGG16, VGG19, MobileNet are not trained for LWIR images. RoadScene dataset which contains a pair of aligned long-wave infrared and visible images is used and performance of CNN features and distance measures is objectively evaluated using computation time, number of patches in top 5 and also in computing how close the retrieved LWIR patch is to the visual patch. Results demonstrate that Cosine similarity measure is relatively better when compared to all the other distance measures in addressing the spectral variations.