Abstract Merging and interactions can radically transform galaxies. However, identifying these events based solely on structure is challenging as the status of observed mergers is not easily accessible. Fortunately, cosmological simulations are now able to produce more realistic galaxy morphologies, allowing us to directly trace galaxy transformation throughout the merger sequence. To advance the potential of observational analysis closer to what is possible in simulations, we introduce a supervised deep learning Convolutional Neural Network (CNN) and Vision Transformer (ViT) hybrid framework, Mummi (MUlti Model Merger Identifier). Mummi is trained on realism-added synthetic data from IllustrisTNG100-1, and is comprised of a multi-step ensemble of models to identify mergers and non-mergers, and to subsequently classify the mergers as interacting pairs or post-mergers. To train this ensemble of models, we generate a large imaging dataset of 6.4 million images targeting UNIONS with RealSimCFIS. We show that Mummi offers a significant improvement over many previous machine learning classifiers, achieving 95% pure classifications even at Gyr long timescales when using a jury-based decision making process, mitigating class imbalance issues that arise when identifying real galaxy mergers from z = 0 to 0.3. Additionally, we can divide the identified mergers into pairs and post-mergers at 96% success rate. We drastically decrease the false positive rate in galaxy merger samples by 75%. By applying Mummi to the UNIONS DR5-SDSS DR7 overlap, we report a catalog of 13,448 high confidence galaxy merger candidates. Finally, we demonstrate that Mummi produces powerful representations solely using supervised learning, which can be used to bridge galaxy morphologies in simulations and observations.