AbstractCurrently, time domain pitch synchronous overlap‐add (TD‐PSOLA) is the most popular synthesis algorithm for text‐to‐speech (TTS) systems. The algorithm produces very high quality synthetic speech, particularly when a pitch modification factor is small. However, as the pitch modification factor becomes larger, the quality degradation due to a slight pitch epoch detection error becomes severe. On the other hand, the vocoder framework has very flexible prosody manipulation. It can obtain a uniform voice quality over a wide range of prosody modification. Unfortunately, the synthesized speech quality from the vocoder is far from natural human speech, often showing buzzy quality. To remedy buzzy quality from the vocoder and make more natural synthetic speech, we propose a new speech synthesis algorithm for high quality TTS systems that is based on the homomorphic vocoder framework. The impulse response of vocal tract is obtained by mixing the minimum phase in lower frequency band and original phase in higher frequency band—thus, the name is a mixed phase vocoder. Informal subjective listening tests reveal that the mixed phase vocoder is a good candidate for TTS synthesis with high intelligibility and naturalness. Copyright © 2004 AEI.