The integration of visual letters and speech sounds is a crucial part of learning to read. Previous studies investigating this integration have revealed a modulation by audiovisual (AV) congruency, commonly known as the congruency effect. To investigate the cortical oscillations of the congruency effects across different oscillatory frequency bands, we conducted a Japanese priming task in which a visual letter was followed by a speech sound. We analyzed the power and phase properties of oscillatory activities in the theta and beta bands between congruent and incongruent letter-speech sound (L-SS) pairs. Our results revealed stronger theta-band (5-7Hz) power in the congruent condition and cross-modal phase resetting within the auditory cortex, accompanied by enhanced inter-trial phase coherence (ITPC) in the auditory-related areas in response to the congruent condition. The observed congruency effect of theta-band power may reflect increased neural activities in the left auditory region during L-SS integration. Additionally, theta ITPC findings suggest that visual letters amplify neuronal responses to the following corresponding auditory stimulus, which may reflect the differential cross-modal influences in the primary auditory cortex. In contrast, decreased beta-band (20-35 Hz) oscillatory power was observed in the right centroparietal regions for the congruent condition. The reduced beta power seems to be unrelated to the processing of AV integration, but may be interpreted as the brain response to predicting auditory sounds during language processing. Our data provide valuable insights by indicating that oscillations in different frequency bands contribute to the disparate aspects of L-SS integration.