Is it possible and useful to compute STOI from Mel-specrograms? how is it possible?
Answer from top 10 papers
The Short-Time Objective Intelligibility (STOI) measure is a method used to predict the intelligibility of speech signals, which is typically computed using time-frequency representations of speech such as spectrograms (Fazeel et al., 2023; Fuh et al., 2020). While the traditional STOI algorithm does not explicitly mention the use of Mel-spectrograms, which are a perceptually weighted version of the spectrogram, the literature suggests that speech intelligibility prediction methods, including STOI, can benefit from various time-frequency representations of speech (Jensen & Taal, 2016).
Interestingly, Andersen et al. (2018) introduces a novel approach that transforms Mel-spectrograms into 3D data for underwater acoustic target classification, indicating that Mel-spectrograms can be manipulated and potentially used in advanced intelligibility prediction models. Additionally, Gul et al. (2024) highlights the effectiveness of Mel spectrogram features in speech enhancement for hearing-impaired scenarios, which suggests that Mel-spectrograms could be relevant in the context of intelligibility prediction.
In summary, while STOI is not traditionally computed from Mel-spectrograms, the literature indicates that Mel-spectrogram features can be effectively used in speech enhancement and potentially in intelligibility prediction models. Therefore, it is plausible that STOI could be computed from Mel-spectrograms, provided that an appropriate transformation or adaptation is applied to align with the STOI algorithm's requirements. This could be useful in scenarios where perceptual weighting is beneficial, such as in hearing-impaired speech enhancement (Gul et al., 2024). However, further research would be needed to develop and validate such an approach.
Source Papers