Self-supervised learning has shown great promise because of its ability to train deep learning (DL) magnetic resonance imaging (MRI) reconstruction methods without fully sampled data. Current self-supervised learning methods for physics-guided reconstruction networks split acquired undersampled data into two disjoint sets, where one is used for data consistency (DC) in the unrolled network, while the other is used to define the training loss. In this study, we propose an improved self-supervised learning strategy that more efficiently uses the acquired data to train a physics-guided reconstruction network without a database of fully sampled data. The proposed multi-mask self-supervised learning via data undersampling (SSDU) applies a holdout masking operation on the acquired measurements to split them into multiple pairs of disjoint sets for each training sample, while using one of these pairs for DC units and the other for defining loss, thereby more efficiently using the undersampled data. Multi-mask SSDU is applied on fully sampled 3D knee and prospectively undersampled 3D brain MRI datasets, for various acceleration rates and patterns, and compared with the parallel imaging method, CG-SENSE, and single-mask SSDU DL-MRI, as well as supervised DL-MRI when fully sampled data are available. The results on knee MRI show that the proposed multi-mask SSDU outperforms SSDU and performs as well as supervised DL-MRI. A clinical reader study further ranks the multi-mask SSDU higher than supervised DL-MRI in terms of signal-to-noise ratio and aliasing artifacts. Results on brain MRI show that multi-mask SSDU achieves better reconstruction quality compared with SSDU. The reader study demonstrates that multi-mask SSDU at R = 8 significantly improves reconstruction compared with single-mask SSDU at R = 8, as well as CG-SENSE at R = 2.