Gestalt perception refers to the cognitive ability to perceive various elements as a unified whole. In our study, we delve deeper into the phenomenon of Gestalt recognition in visual cubist art, a transformative process culminating in what is often described as an Aha moment. This Aha moment signifies a sudden understanding of what is seen, merging seemingly disparate elements into a coherent meaningful picture. The onset of this Aha moment can vary, either appearing almost instantaneously, which is in line with theories of hedonic fluency, or manifesting after a period of time, supporting the concept of delayed but more in-depth meaningful insight. We employed pupillometry to measure cognitive and affective shifts during art interaction, analyzing both maximum pupil dilation and average dilation across the trial. The study consisted of two parts: in the first, 84 participants identified faces in cubist paintings under various conditions, with Aha moments and pupil dilation measured. In part 2, the same 84 participants assessed the artworks through ratings in a no-task free-viewing condition. Results of part 1 indicate a distinctive pattern of pupil dilation, with maximum dilation occurring at both trial onset and end. Longer response times were observed for high-fluent, face-present stimuli, aligning with a delayed but accurate Aha-moment through recognition. Additionally, the time of maximum pupil dilation, rather than average dilation, exhibited significant associations, being later for high-fluent, face-present stimuli and correct detections. In part 2, average, not the time of maximum pupil dilation emerged as the significant factor. Face-stimuli and highly accessible art evoked stronger dilations, also reflecting high clearness and negative valence ratings. The study underscores a complex relationship between the timing of recognition and the Aha moment, suggesting nuanced differences in emotional and cognitive responses during art viewing. Pupil dilation measures offer insight into these processes especially for moments of recognition, though their application in evaluating emotional responses through artwork ratings warrants further exploration.