Natural Language Watermarking by Morpheme Segmentation

Miyoung Kim

doi:10.1109/aciids.2009.21

Abstract

This paper explores the method for Korean text watermarking and develops a morpheme-based scheme that a predicate nominal is segmented into a nominal and a predicate. Korean, as an agglutinative language, provides a good ground for the morpheme-based natural language watermarking. Korean word usually consists of a content morpheme and function morphemes. However, predicate nominal has two content morphemes--nominal and predicate. So, we propose a method to separate a predicate nominal into two words and assign a content morpheme into each of the new words. The division of a predicate nominal does not change the meaning of the sentence, and it also ensures the naturalness of the sentence.Our proposed natural language watermarking method consists of five procedures. First, we perform morphological analysis of unmarked text. Next, we choose target predicate nominals for division, and determine the division type. And then, we employ an insertion bit according to the division type. Third, we embed a watermark bit for each predicate nominal. Fourth, if the watermark bit does not correspond to the insertion bit, we divide the predicate nominal into two words. Finally, we obtain marked text. From the experimental results, we show that the rate of unnatural sentences in marked text is significantly lower than that of previous systems. Experimental results also show that the marked text keeps the same style, and it has the same information without semantic distortion.

Full Text