Abstract

It is now well known that practical steganalysis using machine learning techniques can be strongly biased by the problem of Cover Source Mismatch. Such a phenomenon usually occurs in machine learning when the and the testing sets are drawn from different sources, i.e. when they do not share the same statistical properties. In the field of steganalysis however, due to the small power of the signal targeted by steganalysis methods, it can drastically lower their performance. This paper aims to define through practical experiments what is a source in steganalysis. By assuming that two cover datasets coming from a common source should provide comparable performances in steganalysis, it is shown that the definition of a source is more related with the processing pipeline of the RAW images than with the sensor or the acquisition setup of the pictures. In order to measure the discrepancy between sources, this paper introduces the concept of consistency between sources, that quantifies how much two sources are subject to Cover Source Mismatch. We show that by adopting training de-sign, we can increase the consistency between the set and the testing set. To measure how much image processing operation may help the steganographers this paper also introduces the intrinsic difficulty of a source. It is observed that some processes such as JPEG quan-tization tables or the development pipeline can dramatically increase or decrease the performance of steganalysis methods and that other parameters such as the ISO sensitivity or the sensor model have minor impact on the performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.