Abstract

A large portion of the Internet bandwidth is used for transmission of multimedia such as audio data. As the file sizes are usually much bigger than the maximum network packet size, the audio data are segmented into fragments. For eavesdropping or network surveillance purposes, the first step of a sniffer may be to determine the codec by which a fragment is generated. This problem is usually modeled as a multi-class classification problem. The basic methods for determining the codec type of each fragment rely on the metadata in the corresponding file header. However, in a non-cooperative context, the whole file is not available. Statistical features extracted from the fragments are generally used for solving this problem by employing machine learning methods. In this paper, we present an end to end scalable deep learning approach for the classification of audio codecs with variable bit-rates. This method is based on an efficient variant of convolutional neural networks, which learns hidden layer representations to encode input vectors. Moreover, by integrating the statistical features and Cascade Deep Forest method, we jointly optimize classification labels assignment and learn the features that are suitable for classification. In several experiments on a dataset of speech codecs, we demonstrate that our approach outperforms the state-of-the-art methods by a significant margin.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call