Anomalous melt pools during metal additive manufacturing (AM) can lead to deteriorated mechanical and fatigue performance. In-situ monitoring of the melt pool subsurface morphology requires specialized equipment that may not be readily accessible or scalable. Therefore, we introduce a machine learning framework to correlate in-situ two-color thermal images observed via high-speed color imaging to the two-dimensional profile of the melt pool cross-section. We employ a hybrid CNN-Transformer architecture to establish a correlation between single bead off-axis thermal image sequences and melt pool cross-section contours measured via optical microscopy. Specifically, a ResNet model embeds the spatial information contained within the thermal images to a latent vector, while a Transformer model correlates the sequence of embedded vectors to extract temporal information. The performance of this model is evaluated through dimensional and geometric comparisons to the corresponding experimental no-powder melt pool observations. Our framework is able to model the curvature of the subsurface melt pool structure, with improved performance in high energy density regimes compared to analytical models. Additionally, the use of ratiometric temperature estimates improves the accuracy of the model predictions compared to monochromatic imaging. This work establishes a framework extensible towards powder-based AM builds.