Five structural equation models (SEMs) were developed to examine explicitly the relative contribution of experiential (subjective frequency and word age of acquisition) and word similarity (lexical equivalence class size and phoneme equivalence class size) factors on word identification accuracy (WIA). WIA (percent word correct scores and word cost scores) was measured for 184 monosyllabic words under five presentation conditions: (1) visual-only speech; (2) high intelligibility vocoded auditory-only speech; (3) low intelligibility vocoded auditory-only speech; (4) high intelligibility vocoded audiovisual speech; and (5) low intelligibility vocoded audiovisual speech. The results showed that each factor can be treated as an isolated factor and can be measured explicitly. Furthermore, the relative strengths of their contributions varied as a function of intelligibility and the estimation power of the predictor variables. In addition, two models were developed to estimate audiovisual (conditions 4 and 5) WIA from experiential, visual- and auditory-similarity factors. Only the SEM of condition 4 fit the data, and the experiential and auditory factors contributed equally, but the visual factor did not contribute much to the WIA in this model. Advantages and limitations of SEM will be discussed.