Abstract

The current online digital world, consisting of thousands of newspapers, blogs, social media, and cloud file sharing services, is providing easy and unlimited access to a large treasure of text contents. Making copies of these text contents is simple and virtually costless. As a result, producers and owners of text content are interested in the protection of their intellectual property (IP) rights. Digital watermarking has become crucially important in the protection of digital contents. Out of all, text watermarking poses many challenges, since text is characterized by a low capacity to embed a watermark and allows only a restricted number of alternative syntactic and semantic permutations. This becomes even harder when authors want to protect not just a whole book or article, but each single sentence or paragraph, a problem well known to copyright law. In this paper, we present a fine-grain text watermarking method that protects even small portions of the digital content. The core method is based on homoglyph characters substitution for latin symbols and whitespaces. It allows to produce a watermarked version of the original text, preserving the anonymity of the users according to the right to privacy. In particular, the embedding and extraction algorithms allow to continuously protect the watermark through the whole document in a fine-grain fashion. It ensures visual indistinguishability and length preservation, meaning that it does not cause overhead to the original document, and it is robust to the copy and past of small excerpts of the text. We use a real dataset of 1.8 million New York articles to evaluate our method. We evaluate and compare the robustness against common attacks, and we propose a new measure for partial copy and paste robustness. The results show the effectiveness of our approach providing an average length of 101 characters needed to embed the watermark and allowing to protect paragraph-long excerpt or smaller the 94.5% of the times.

Highlights

  • The last decades are characterized by the easy availability of millions upon millions of digital contents that meet several kind of users’ needs both in professional activities and social interactions

  • 5 Results and discussion We conduct several experiments to assess the crucial properties of the proposed approach: the number of symbols required to embed a full watermark, the imperceptibility of changes in the watermarked text with respect to the original text, and the robustness of the watermark

  • Because we want to show the fine-grain method capability at the paragraph level, we extract from each article only the lead paragraph3

Read more

Summary

Introduction

The last decades are characterized by the easy availability of millions upon millions of digital contents that meet several kind of users’ needs both in professional activities and social interactions. An important reason for the proliferation of digital contents among users is the increase in the usage of online communication platforms, like websites, social media, and cloud file sharing services, to name a few. All these platforms have introduced changes in the user habits with respect to digital contents by increasing the copying and sharing of text, audio, images, and video, namely digital contents [1]. While the current digital technologies facilitate the copy and sharing of these digital contents, this is often an unattributed copy of others’ work, resulting in a misappropriation of their intellectual property In several contexts, such as for online newspapers and blogs, the contents’ owners have solid interests in protecting their IP rights, in order to preserve their business. A visible watermark may be not readable, that is, a user can visually detect it but cannot read its content

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call