Capsule endoscopy (CE) has become the preferred method of investigating the small intestine [1, 2]. Although capsule administration is a relatively straightforward task in the absence of medical complications, reading of CE videos is laborious, with outcome and reporting dependent not only on reviewer attentiveness and expertise but also on several other specific perceptual and interpretational factors [3, 4]. When viewing a soporific stream of often repetitive nondistinct images in a quiet, dark room, a significant risk of loss of concentration can lead to inaccuracy of reported findings [4, 5]. Nevertheless, practicing gastroenterologists may be considered adequately trained in CE reporting after a short, 1-day training program [6]. Moreover, formal training in CE during gastrointestinal (GI) fellowship, defined only loosely, includes completion of a hands-on course with a minimum of 8 h of continuing medical education (CME) credit, followed by review of CE studies by a credentialed capsule endoscopist [6]. There is currently no standardization of national or international training programs, although guidelines are being developed [7]. Furthermore, only limited evidence-based information on the optimal reading mode of CE review is currently available [3, 4, 8]. The past has taught us a great deal about medical image perception, not only in ‘‘classical’’ image-based specialties such as radiology and pathology, but also in other clinical specialties that use imaging technology—such as gastroenterology, laparoscopic surgery, or dermatology [9–11]. Medical images and videos represent a significant source of information that aid clinicians with diagnostic and therapeutic decisions [11]. Yet, the correct interpretation of medical images relies on a host of factors, with significant health andmedicolegal issues accruing from their inaccurate interpretation,which consists of two basic processes—visual perception (image inspection) and cognition (rendering an interpretation) [10, 11]. The use and development of computer-based models to predict human performance has also been a topic of interest for which a paucity of perceptionoriented research exists, yet the opportunities abound. The American Society of Gastrointestinal Endoscopy (ASGE) recommends a minimum number of 20 supervised procedures to provide adequate experience for those intending to practice CE independently [6]. Commercially available software provides a diverse range of viewing modes (VM) and frame rates (FR), in addition to other image enhancement tools such as digital chromoendoscopy [3, 12, 13]. No consensus has been reached for the latter technique according to a number of studies, its optimal mode of application yet to be determined [2, 3]. The use of differing VM has been, to date, the subject of only two studies [14, 15]. In the most recent large cohort study, Zheng et al. [15] reported that the low lesion detection rates observed were not influenced by increasing CE experience. Detection rates are significantly higher when reading in single VM/FR15 (single screen with FR 15/s) and quad VM/FR20 (four screens with FR 20/s) compared with reading in single View/FR25 (single screen with FR 25/s). Increasing viewing speed in quad VM from FR20 to FR30 appears to have no significant effect on detection ability. Therefore, the investigators suggested that quality control measures to compare and improve lesion detection rates need further study. A. Koulaouzidis (&) Endoscopy Unit, Royal Infirmary of Edinburgh, 51 Little France Crescent, Edinburgh EH16 4SA, Scotland, UK e-mail: Tassos.Koulaouzidis@luht.scot.nhs.uk URL: http://www.drkoulaouzidis.com