Speech recordings carry useful information for the devices used to capture them. Here, acquisition device identification is studied using ‘sketches of features’ as intrinsic device characteristics. That is, starting from large-size raw feature vectors obtained by either averaging the log-spectrogram of a speech recording along the time axis or stacking the parameters of each component for a Gaussian mixture model modelling the speech recorded by a specific device, features of reduced size are extracted by mapping these raw feature vectors into a low-dimensional space. The mapping preserves the ‘distance properties’ of the raw feature vectors. It is obtained by taking the inner product of the raw feature vector with a vector of independent identically distributed random variables drawn from a p-stable distribution. State-of-the art classifiers, such as a sparse representation-based classifier or support vector machines, applied to the sketches yield an identification accuracy exceeding 94% on a set of eight landline telephone handsets from Lincoln-Labs Handset Database. Perfect identification is reported for a set of 21 cell-phones of various models from seven different brands.
Read full abstract