Background: Endoscopic procedures are performed by people with variable skill sets, and endoscopic results are interpretations of visual observations by people with variable reference sets. Differences in reference sets are difficult to assess, but differences in skill sets can indirectly be measured as, for instance, average total time per procedure, frequency of intubation of the cecum, and frequency and type of complications. A system to automatically capture and document the findings of endoscopic procedures based on standards using reference information does not exist. Therefore, direct endoscopy to endoscopy or patient to patient comparisons cannot be performed. Goal: To develop a system that allows automatic documentation and extraction of findings for each step of an endoscopic procedure, allows comparison of procedures performed by different operators, and provides quality control as well as educational means to improve procedural skills. Results: We created a capture system that combines the entire video stream of colonoscopy with audio annotation (location information and comments) by the endoscopist, and records this in digital MPEG-2 format. Using this system we created a digital multimedia database consisting of over 200 complete, anonymized, audio-annotated, colonoscopies. Next, we developed an automated video segmentation algorithm that extracts location information (e.g., rectum, sigmoid, etc) and comments using speech recognition and natural language processing from the audio segment of the data stream and divides each MPEG-2 file in up to 13 scenes (rectum, sigmoid, descending, transverse, ascending, cecum, TI, cecum, ... rectum). Subsequently, we applied a new automated algorithm to remove non-informative or blurred images. At present we successfully segment nearly 9 out of 10 colonoscopies, and extract blurry frames, on average 37% of frames, with an accuracy of 95%. Conclusions: We have created a digital multimedia database for colonoscopy, a method to segment digitized multimedia colonoscopy files into anatomic scenes, and a novel algorithm that removes blurry images with high accuracy. Our current system will form the basic infrastructure that will allow us to develop software tools for image analysis, content-based video retrieval and creation of a distributed system able to capture procedure-related information from different geographic locations.