Abstract

This paper describes a frame based audio signal decomposition approach. The aim of the proposed approach is first to decompose an audio signal into transient, sinusoidal and residual components (TSR). Applying linear prediction onto each signal frame, transient detection is started on the prediction error. Assuming that sudden changes like transients are difficult to predict, the prediction error is thus expected to have very high energy in transient areas. Using the estimated envelope of the prediction error and its first order statistical moments, a suitable adaptive threshold is derived which ensures correct transient detection in various audio signals. Once a transient region is detected in a signal frame, it is directly separated yielding a first residual signal containing sinusoids and noise. To extract the sinusoidal components, partial tracking based on psychoacoustic masking is performed on the first residual signal. A second residual signal is then obtained by subtracting the psychoacoustic relevant sinusoidal components. This second residual is then processed by tracking and removing remaining sinusoidal components yielding a final residual without transient and sinusoidal components. Pitch and time scaling can then be applied onto the decomposed signal components. Pitch scaling is only applied onto the sinusoidal components, transient and residual components are not changed. Time scaling is applied here to the sinusoidal and residual components, whereas the transient components are only shifted in time.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call