Abstract

The recording device along with the acoustic environment plays a major role in digital audio forensics. We propose an acoustic source identification system in this paper, which includes identifying both the recording device and the environment in which it was recorded. A hybrid Convolutional Neural Network (CNN) with Long Short-Term Memory (LSTM) is used in this study to automatically extract environments and microphone features from the speech sound. In the experiments, we investigated the effect of using the voiced and unvoiced segments of speech on the accuracy of the environment and microphone classification. We also studied the effect of background noise on microphone classification in 3 different environments, i.e., very quiet, quiet, and noisy. The proposed system utilizes a subset of the KSU-DB corpus containing 3 environments, 4 classes of recording devices, 136 speakers (68 males and 68 females), and 3600 recordings of words, sentences, and continuous speech. This research combines the advantages of both CNN and RNN (in particular bidirectional LSTM) models, called CRNN. The speech signals were represented as a spectrogram and were fed to the CRNN model as 2D images. The proposed method achieved accuracies of 98% and 98.57% for environment and microphone classification, respectively, using unvoiced speech segments.

Highlights

  • Forensics refers to the science that uses scientific methods or expertise to investigate crimes or examine an evidence which may be presented in a court of law

  • Digital media forensics is a branch of forensic science that involves ensuring that the digital content is accurate and authentic [1]

  • We studied the effect of background noise on microphone classification in different environments and the effect of microphone quality on the environment classification

Read more

Summary

INTRODUCTION

Forensics refers to the science that uses scientific methods or expertise to investigate crimes or examine an evidence which may be presented in a court of law. Digital audio forensic includes different activities, such as identifying speakers from the audio, identifying the environment or the recording device, and checking the integrity of the audio content. Contentbased authentication analyzes the actual content of the audio recording, including the Electric Network Frequency (ENF) analysis, acquisition device, and environment identification. There is no previous study that used the proposed model to classify environments and the microphones using the KSU-DB corpus.

LITERATURE REVIEW
ARCHITECTURE OF THE PROPOSED MODELS
EVALUATION OF SYSTEM ACCURACY
RESULTS AND DISCUSSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.