Neural representations induced by naturalistic stimuli offer insights into how humans respond to stimuli in daily life. Understanding neural mechanisms underlying naturalistic stimuli processing hinges on the precise identification and extraction of the shared neural patterns that are consistently present across individuals. Targeting the Electroencephalogram (EEG) technique, known for its rich spatial and temporal information, this study presents a framework for Contrastive Learning of Shared SpatioTemporal EEG Representations across individuals (CL-SSTER). CL-SSTER utilizes contrastive learning to maximize the similarity of EEG representations across individuals for identical stimuli, contrasting with those for varied stimuli. The network employs spatial and temporal convolutions to simultaneously learn the spatial and temporal patterns inherent in EEG. The versatility of CL-SSTER was demonstrated on three EEG datasets, including a synthetic dataset, a natural speech comprehension EEG dataset, and an emotional video watching EEG dataset. CL-SSTER attained the highest inter-subject correlation (ISC) values compared to the state-of-the-art ISC methods. The latent representations generated by CL-SSTER exhibited reliable spatiotemporal EEG patterns, which can be explained by properties of the naturalistic stimuli. CL-SSTER serves as an interpretable and scalable framework for the identification of inter-subject shared neural representations in naturalistic neuroscience.