Sinusoidal modeling has enjoyed a rich history in both speech and music applications, including sound transformations, compression, denoising, and auditory scene analysis. For such applications, the underlying signal model must efficiently capture salient audio features (Goodwin 1998). In this article, we present an accurate, efficient, and flexible three-part model for audio signals consisting of sines, transients, and noise by extending spectral modeling synthesis (SMS) (Serra and Smith 1990) with an explicit flexible transient model called transient-modeling synthesis (TMS). The sinusoidal transformation system (STS) (McAulay and Quatieri 1986) and SMS find the slowly varying sinusoidal components in a signal using spectral-peak-picking algorithms. Subtracting the synthesized sinusoids from the original signal creates a residual consisting of transients and noise (Serra 1989; George and Smith 1992). However, sinusoids do not model this residual well. Although it is possible to model transients and noise by a sum of sinusoidal signals (as with the Fourier transform), it is neither efficient, because transient and noisy signals require many sinusoids for their description, nor meaningful, because transients are short-lived signals, while the sinusoidal model uses sinusoids that are active on a much larger time scale. In the STS system (generally applied to speech), the transient + noise residual is often masked sufficiently to be ignored (McAulay and Quatieri 1986). In music applications, this residual is often important to the integrity of the signal. The SMS system extends the sinusoidal model by explicitly modeling the residual as slowly filtered white noise. Although this technique has been very successful, transients do not fit well into this model, because transients modeled as filtered noise lose sharpness in their attack and tend to sound dull. Because transients are