The perceptual quality of stereoscopic images plays an essential role in the human perception of visual information. However, most available stereoscopic image quality assessment (SIQA) methods evaluate 3D visual experience using hand-crafted features or shallow architectures, which cannot model the visual properties of stereo images well. In this paper, we use convolutional neural networks (CNNs) to learn deeper local quality-aware structures for stereo images. With different inputs, two CNN models are designed for no-reference SIQA tasks. The one-column CNN model directly accepts a cyclopean view as the input, and the three-column CNN model jointly considers the cyclopean, left and right views as CNN inputs. The two SIQA frameworks share the same implementation approach: First, to overcome the obstacle of limited SIQA datasets, we accept image patches that have been cropped from corresponding stereopairs as inputs for local quality-sensitive feature extraction. Next, a local feature selection algorithm is used to remove related features on non-salient patches, which could cause large prediction errors. Finally, the reserved local visual structures of salient regions are aggregated into a final quality score in an end-to-end manner. Experimental results on three public SIQA databases demonstrate that our method outperforms most state-of-the-art no-reference (NR) SIQA methods. The results of a cross-database experiment also show the robustness and generality of the proposed method.