Nearest neighbor search, also known as NNS, is a technique used to locate the points in a high-dimensional space closest to a given query point. This technique has multiple applications in medicine, such as searching large medical imaging databases, disease classification, and diagnosis. However, when the number of points is significantly large, the brute-force approach for finding the nearest neighbor becomes computationally infeasible. Therefore, various approaches have been developed to make the search faster and more efficient to support the applications. With a focus on medical imaging, this paper proposes DenseLinkSearch (DLS), an effective and efficient algorithm that searches and retrieves the relevant images from heterogeneous sources of medical images. Towards this, given a medical database, the proposed algorithm builds an index that consists of pre-computed links of each point in the database. The search algorithm utilizes the index to efficiently traverse the database in search of the nearest neighbor. We also explore the role of medical image feature representation in content-based medical image retrieval tasks. We propose a Transformer-based feature representation technique that outperformed the existing pre-trained Transformer-based approaches on benchmark medical image retrieval datasets. We extensively tested the proposed NNS approach and compared the performance with state-of-the-art NNS approaches on benchmark datasets and our created medical image datasets. The proposed approach outperformed the existing approaches in terms of retrieving accurate neighbors and retrieval speed. In comparison to the existing approximate NNS approaches, our proposed DLS approach outperformed them in terms of lower average time per query and ≥99% R@10 on 11 out of 13 benchmark datasets. We also found that the proposed medical feature representation approach is better for representing medical images compared to the existing pre-trained image models. The proposed feature extraction strategy obtained an improvement of 9.37%, 7.0%, and 13.33% in terms of P@5, P@10, and P@20, respectively, in comparison to the best-performing pre-trained image model. The source code and datasets of our experiments are available at https://github.com/deepaknlp/DLS.
Read full abstract