Abstract
This paper introduces the concept of semantic organizing processes as a means of inferring theoretically meaningful behavior from the observation of raw text. Semantic organizing processes are mechanisms by which a set of authors come to produce texts that are similar in some observable, quantifiable way. We introduce three broad semantic organizing processes -- authors sharing subject matter, authors sharing goals, and authors sharing sources -- and argue that each of these processes will lead to texts that tend to share n-grams at different lengths: short n-grams for shared subject matter, moderate length n-grams for shared goals, and long n-grams for shared sources. To test these hypotheses, we develop a novel n-gram extraction technique to capture text similarity based on n-grams of different lengths. We then apply our technique to a corpus where the author attributes are observable: the public statements of the Members of the U.S. Congress. Our results support the hypothesis that these three processes are reflected in distinct kinds of textual similarity. This article presents the first empirical finding that different social processes are detectable through the structure of overlapping textual features. The finding has important implications for modeling text and understanding underlying social processes.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Similar Papers
More From: SSRN Electronic Journal
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.