Colorectal cancer (CRC) is a major global health concern, with microsatellite instability-high (MSI-H) being a defining characteristic of hereditary nonpolyposis colorectal cancer syndrome and affecting 15% of sporadic CRCs. Tumors with MSI-H have unique features and better prognosis compared to MSI-L and microsatellite stable (MSS) tumors. This study proposed establishing a MSI prediction model using more available and low-cost colonoscopy images instead of histopathology. The experiment utilized a database of 427 MSI-H and 1590 MSS colonoscopy images and vision Transformer (ViT) with different feature training approaches to establish the MSI prediction model. The accuracy of combining pre-trained ViT features was 84% with an area under the receiver operating characteristic curve of 0.86, which was better than that of DenseNet201 (80%, 0.80) in the experiment with support vector machine. The content-based image retrieval (CBIR) approach showed that ViT features can obtain a mean average precision of 0.81 compared to 0.79 of DenseNet201. ViT reduced the issues that occur in convolutional neural networks, including limited receptive field and gradient disappearance, and may be better at interpreting diagnostic information around tumors and surrounding tissues. By using CBIR, the presentation of similar images with the same MSI status would provide more convincing deep learning suggestions for clinical use.