Stereo image compression (SIC) aims to simultaneously compress a pair of left and right stereoscopic images, which can achieve higher compression efficiency than single image compression. In this paper, to benefit the SIC tasks, we collect a large real-world stereo image dataset, namely Palace, which is composed of hundreds of stereo image pairs at high-resolution. More importantly, we propose a novel mask stereo image compression network, namely MASIC, which can jointly compress the stereo images with high compression efficiency. Specifically, we first estimate the homography matrix between the stereo images through a regression model. Then, the left image is spatially transformed by the homography matrix, so that only the residual information needs to be encoded for the right image. To avoid the wrong guidance between stereo image pair, we propose a mask prediction module (MPM) to generate a multi-channel guided mask to navigate both the encoding and decoding processes. Based on the guided mask, we introduce a new mask conditional stereo entropy (MCSE) model, to fully explore the correlation between the stereo images in entropy coding. In the decoder, we develop a stereo decoding module to simultaneously decode the stereo images and enhance their compression quality. Experimental results show that our MASIC significantly advances the performance of SIC both quantitatively and qualitatively on a variety of datasets, and is robust to the change of parallax level between stereo images. The software codes are available at https://github.com/eecoder-dyf/MASIC.