This paper proposes a method for an automatic detection of 3D-display-friendly scenes from video sequences. Manual selection of such scenes by a human user would be extremely time consuming and would require additional evaluation of the result on 3D display. The input videos can be intentionally captured or taken from other sources, such as films. First, the input video is analyzed and the camera trajectory is estimated. The optimal frame sequence that follows defined rules, based on optical attributes of the display, is then extracted. This ensures the best visual quality and viewing comfort. The following identification of a correct focusing distance is an important step to produce a sharp and artifact-free result on a 3D display. Two novel and equally efficient focus metrics for 3D displays are proposed and evaluated. Further scene enhancements are proposed to correct the unsuitably captured video. Multiple image analysis approaches used in the proposal are compared in terms of both quality and time performance. The proposal is experimentally evaluated on a state-of-the-art 3D display by Looking Glass Factory and is suitable even for other multi-view devices. The problem of optimal scene detection, which includes the input frames extraction, resampling, and focusing, was not addressed in any previous research. Separate stages of the proposal were compared with existing methods, but the results show that the proposed scheme is optimal and cannot be replaced by other state-of-the-art approaches.