Music is omnipresent in daily life and may interact with critical cognitive processes including memory. Despite music's presence during diverse daily activities including studying, commuting, or working, existing literature has yielded mixed results as to whether music improves or impairs memory for information experienced in parallel. To elucidate how music memory and its predictive structure modulate the encoding of novel information, we developed a cross-modal sequence learning task during which participants acquired sequences of abstract shapes accompanied with paired music. Our goal was to investigate whether familiar and structurally regular music could provide a "temporal schema" (rooted in the organized and hierarchical structure of music) to enhance the acquisition of parallel temporally-ordered visual information. Results revealed a complex interplay between music familiarity and music structural regularity in learning paired visual sequences. Notably, compared to a control condition, listening to well-learned, regularly-structured music (music with high predictability) significantly facilitated visual sequence encoding, yielding quicker learning and retrieval speed. Conversely, learned but irregular music (where music memory violated musical syntax) significantly impaired sequence encoding. While those findings supported our mechanistic framework, intriguingly, unlearned irregular music-characterized by the lowest predictability-also demonstrated memory enhancement. In conclusion, this study demonstrates that concurrent music can modulate visual sequence learning, and the effect varies depending on the interaction between both music familiarity and regularity, offering insights into potential applications for enhancing human memory.