We propose a sophisticated method of acoustic scene analysis with intermittently missing observations, which analyzes acoustic scenes and restores missing observations simultaneously on the basis of the temporal correlation between acoustic words. One effective strategy for analyzing acoustic scenes is to characterize them as a combination of acoustic words. An acoustic topic model (ATM) is one of the techniques, which models the process generating multiple acoustic words. Here, an acoustic word corresponds to a sound category, while it has a homogenous time duration and is defined time frame by time frame. In the ATM, it is assumed that all acoustic words are observed, and therefore, it cannot be applied if any acoustic observations are missing. However, acoustic observations may sometimes be missing because of poor recording conditions, transmission loss, or privacy reasons. In the proposed method, focusing on the fact that acoustic words are temporally correlated, we consider the transition of acoustic words in two ways: First, by modeling the temporal transition of acoustic words directly using a Markov process and finally, by modeling the temporal transition of hidden states that generate acoustic words using a hidden Markov model. We then incorporate each transition model in a process generating acoustic words based on the ATM. The proposed method allows us to analyze acoustic scenes from acoustic words by restoring missing acoustic words. In our experiments, the proposed method exhibited a classification accuracy of acoustic scenes close to that for the case of no missing observations even when 50% of the observations were missing. Moreover, the model considering the hidden-state transition can classify acoustic scenes more accurately than the model considering the acoustic word transition directly.