Robust Speech Hashing for Digital Audio Forensics

Jaisson Vargas,Diego Renza,Dora M Ballesteros

doi:10.3390/app10010249

Jaisson Vargas, Diego Renza + Show 1 more

Open Access

https://doi.org/10.3390/app10010249

Copy DOI

Journal: Applied Sciences	Publication Date: Dec 28, 2019
Citations: 3	License type: CC BY 4.0

Affiliation: Military University Nueva Granada

Abstract

The verification of the integrity and authenticity of multimedia content is an essential task in the forensic field, in order to make digital evidence admissible. The main objective is to establish whether the multimedia content has been manipulated with significant changes to its content, such as the removal of noise (e.g., a gunshot) that could clarify the facts of a crime. In this project we propose a method to generate a summary value for audio recordings, known as hash. Our method is robust, which means that if the audio has been modified slightly (without changing its significant content) with perceptual manipulations such as MPEG-4 AAC, the hash value of the new audio is very similar to that of the original audio; on the contrary, if the audio is altered and its content changes, for example with a low pass filter, the new hash value moves away from the original value. The method starts with the application of MFCC (Mel-frequency cepstrum coefficients) and the reduction of dimensions through the analysis of main components (principal component analysis, PCA). The reduced data is encrypted using as inputs two values from a particular binarization system using Collatz conjecture as the basis. Finally, a robust 96-bit code is obtained, which varies little when perceptual modifications are made to the signal such as compression or amplitude modification. According to experimental tests, the BER (bit error rate) between the hash value of the original audio recording and the manipulated audio recording is low for perceptual manipulations, i.e., 0% for FLAC and re-quantization, 1% in average for volume (−6 dB gain), less than 5% in average for MPEG-4 and resampling (using the FIR anti-aliasing filter); but more than 25% for non-perceptual manipulations such as low pass filtering (3 kHz, fifth order), additive noise, cutting and copy-move.

Highlights

The proliferation of technologies and platforms for sharing information has made the increase in information in recent years exponential
In non-perceptual manipulation (Figure 5d), most Bit Error Rate (BER) values are in the last bin (>0.20), which means that the hash value has changed a lot, and it will not be easy to discern whether the contents match
The hash value of an altered recording with volume adjustment should be very similar to that of the original recording, but the signal filtered with an low pass filter (LPF) should provide a hash value distant from the original

Summary

Introduction

The proliferation of technologies and platforms for sharing information has made the increase in information in recent years exponential. How these challenges are addressed depends on the characteristics and availability of information on the content to be evaluated This is usually achieved through the calculation of a numerical value based on input data, i.e., the use of cryptographic hash functions [14]. A static or at least almost insensitive behavior in the hash code with respect to permissible transformations in such content is desired [15] In this context, the approach of robust hash functions for the identification of audio contents has aroused recent interest due to its multiple applications. What is expected is that the hash value be invariable or at least almost insensitive to moderate or permissible transformations in audio signals, such as format changes This can facilitate chain of custody processes for digital evidence, even with unintended errors in the processing of information.

Collatz Conjecture

Proposed Method

Experimental Dataset

Performance Analysis

Analysis with Perceptual and Non-Perceptual Manipulations

Comparison with Related Works

Conclusions