Diplophonia is a type of disordered voice in which two simultaneous pitches are perceived. Most commonly in diplophonic voices, the vocal folds are divided into two parts that vibrate at different frequencies. The glottal area is the projected area of the space between the vocal folds. The glottal area in time is referred to as the glottal area waveform (GAW). The GAW is modeled for diplophonic voice by superimposing two partial GAWs (pGAWs) that are trains of single-peak pulses with different pulse frequencies, i.e., fundamental frequencies ( $f_o$ s). In current kinematic models of diplophonic vocal fold vibration, the pGAWs are assumed to be quasiperiodic. This assumption is mitigated here by modulating pulse-to-pulse cycle length and amplitude. Both random and deterministic modulations are considered. Deterministic modulations depend on the difference of the pGAWs’ instantaneous phases. Model GAWs are fitted to input GAWs using an analysis-by-synthesis approach which we refer to as ‘modulated pulse trains decomposition’ (MPD). MPD is shown to be applicable to diplophonic as well as to nondiplophonic types of dysphonia, which include multi-pulse patterns, random timing behaviours, and chaos. It is mostly robust against modulations but degraded by large random modulations. MPD is compared to a deep autoencoder neural network, and the WaveGlow neural network. In terms of time-domain fitting errors, MPD outperforms the other two approaches unless random modulations are large. MPD outperforms the best of the other two approaches by up to approximately 5 dB. For large random modulations, the deep autoencoder network achieves the smallest fitting errors. In terms of magnitude spectrum fitting errors, WaveGlow is superior except for natural input GAWs containing only nondiplophonic types of dysphonia. Also pulse timing errors are shown to be advantageous for MPD.