A speech intelligibility (SI) prediction model is proposed that includes an auditory preprocessing component based on the physiological anatomy and activity of the human ear, a hierarchical spiking neural network, and a decision back-end processing based on correlation analysis. The auditory preprocessing component effectively captures advanced physiological details of the auditory system, such as retrograde traveling waves, longitudinal coupling, and cochlear nonlinearity. The ability of the model to predict data from normal-hearing listeners under various additive noise conditions was considered. The predictions closely matched the experimental test data under all conditions. Furthermore, we developed a lumped mass model of a McGee stainless-steel piston with the middle-ear to study the recovery of individuals with otosclerosis. We show that the proposed SI model accurately simulates the effect of middle-ear intervention on SI. Consequently, the model establishes a model-based relationship between objective measures of human ear damage, like distortion product otoacoustic emissions, and speech perception. Moreover, the SI model can serve as a robust tool for optimizing parameters and for preoperative assessment of artificial stimuli, providing a valuable reference for clinical treatments of conductive hearing loss.