Abstract

With the widespread availability of cell-phone recording devices, source cell-phone identification has become a hot topic in multimedia forensics. At present, the research on the source cell-phone identification in clean conditions has achieved good results, but that in noisy environments is not ideal. This paper proposes a novel source cell-phone identification system suitable for both clean and noisy environments using spectral distribution features of constant Q transform (CQT) domain and multi-scene training method. Based on the analysis, it is found that the identification difficulty lies in different models of cell-phones of the same brand, and their tiny differences are mainly in the middle and low frequency bands. Therefore, this paper extracts spectral distribution features from the CQT domain, which has a higher frequency resolution in the mid-low frequency. To evaluate the effectiveness of the proposed feature, four classification techniques of Support Vector Machine (SVM), Random Forest (RF), Convolutional Neural Network (CNN) and Recurrent Neuron Network-Long Short-Term Memory Neural Network (RNN-BLSTM) are used to identify the source recording device. Experimental results show that the features proposed in this paper have superior performance. Compared with Mel frequency cepstral coefficient (MFCC) and linear frequency cepstral coefficient (LFCC), it enhances the accuracy of cell-phones within the same brand, whether the speech to be tested comprises clean speech files or noisy speech files. In addition, the CNN classification effect is outstanding. In terms of models, the model is established by the multi-scene training method, which improves the distinguishing ability of the model in the noisy environment than single-scenario training method. The average accuracy rate in CNN for clean speech files on the CKC speech database (CKC-SD) and TIMIT Recaptured Database (TIMIT-RD) databases increased from 95.47% and 97.89% to 97.08% and 99.29%, respectively. For noisy speech files with seen noisy types and unseen noisy types, the performance was greatly improved, and most of the recognition rates exceeded 90%. Therefore, the source identification system in this paper is robust to noise.

Highlights

  • With the development and advancement of digital multimedia and Internet technologies, a variety of powerful and easy-to-operate digital media editing software has emerged, bringing new problems and challenges to the availability of collected data–multimedia security issues

  • The rest of paper is set out as follows: Section 2 analyzes the differences of speech files recorded by different brands of cell-phone and different models of cell-phone from the same brand; Section 3 presents the spectrum distribution features of the constant Q transform (CQT) domain proposed in this paper by device difference analysis and two traditional features—Mel frequency cepstral coefficient (MFCC) and linear frequency cepstral coefficient (LFCC); four kinds of classifiers and a cell-phone source identification algorithm flow chart are introduced in Section 4; Section 5 describes the construction process of the basic speech databases and the noisy speech databases; and Section 6 gives the experimental results

  • They have good performance, but the recognition objects of source cell-phone recognition are almost always speech files recorded in a quiet environment

Read more

Summary

Introduction

With the development and advancement of digital multimedia and Internet technologies, a variety of powerful and easy-to-operate digital media editing software has emerged, bringing new problems and challenges to the availability of collected data–multimedia security issues. The experimental results indicated that the average accuracy for 15 kinds of devices was 96.65% These features have achieved good results in the field of source cell-phone identification, most of these cepstral coefficients are constructed based on the perception characteristics of the human ear. The rest of paper is set out as follows: Section 2 analyzes the differences of speech files recorded by different brands of cell-phone and different models of cell-phone from the same brand; Section 3 presents the spectrum distribution features of the CQT domain proposed in this paper by device difference analysis and two traditional features—MFCC and LFCC; four kinds of classifiers and a cell-phone source identification algorithm flow chart are introduced in Section 4; Section 5 describes the construction process of the basic speech databases and the noisy speech databases; and Section 6 gives the experimental results.

Device Difference Analysis
Classifiers and Algorithm Introduction
RNN-BLSTM
Multi-Scene Training Recognition Systems
Basic Speech Databases
Noisy Speech Databases
Experimental Setup
Parameter Setup
Experiments carried using two classifiers—SVM
Comparison of Features
Comparison
Comparison of Classifiers
Comparison of Single-Scene and Multi-Scene Training
Comparison of Different Identification Algorithms
Findings
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.