This project explores audio classification leveraging representation learning within the TensorFlow framework. The methodology focuses on the initial engineering of wave features derived from raw audio data, which are subsequently used to train convolutional neural networks (CNNs) for effective representation learning. By transforming the audio signals into a structured format amenable to convolutional processing, our system is designed to capture the intrinsic properties and patterns embedded in the sound waves. The feature engineering process is detailed where various envelope features such as the homomorphic envelogram, hilbert envelogram and wavelet decompositions help in extracting meaningful information from the raw audio signals. These engineered features provide a robust foundation for the subsequent layers of convolutional networks. The CNNs are meticulously architected to learn hierarchical representations, effectively capturing both low-level and high-level audio characteristics.This study attempts to reinforces the significance of tailored feature engineering in deep learning and demonstrate an effective audio classification pipeline using representation learning and open up new avenues for research in audio signal processing and machine learning.