Cryo-electron tomography allows the routine visualization of cellular landscapes in three dimensions at nanometer-range resolutions. When combined with single-particle tomography, it is possible to obtain near-atomic resolution structures of frequently occurring macromolecules within their native environment. Two outstanding challenges associated with cryo-electron tomography/single-particle tomography are the automatic identification and localization of proteins, tasks that are hindered by the molecular crowding inside cells, imaging distortions characteristic of cryo-electron tomography tomograms and the sheer size of tomographic datasets. Current methods suffer from low accuracy, demand extensive and time-consuming manual labeling or are limited to the detection of specific types of proteins. Here, we present MiLoPYP, a two-step dataset-specific contrastive learning-based framework that enables fast molecular pattern mining followed by accurate protein localization. MiLoPYP’s ability to effectively detect and localize a wide range of targets including globular and tubular complexes as well as large membrane proteins, will contribute to streamline and broaden the applicability of high-resolution workflows for in situ structure determination.