Abstract

Abstract Observed textual similarities between different pieces of writing are frequently cited by textual scholars as grounds for interpretative stances about the meaning of a passage and its authorship, authenticity, and accuracy. Historically, identifying occurrences of such similarities has been a matter of extensive knowledge and recall of the content and locations of passages contained within certain texts, together with painstaking manual comparison by examining printed copies, use of concordances, or more recently, appropriate use of full-text searchable database systems. The development of increasingly comprehensive and accurate digital corpora of early Chinese transmitted writing raises many opportunities to study these phenomena using more systematic digital techniques. These offer the promise of not only vast savings in time and labor but also new insights made possible only through exhaustive comparisons of types that would be entirely impractical without the use of computational methods. This article investigates and contrasts unsupervised techniques for the identification of textual similarities in premodern Chinese works in general, and the classical corpus in particular, taking the text of the Mozi 墨子 as a concrete example. While specific examples are presented in detail to concretely demonstrate the utility and potential of the techniques discussed, all of the methods described are generally applicable to a wide range of materials. With this in mind, this article also introduces an open-access platform designed to help researchers quickly and easily explore these phenomena within those materials most relevant to their own work.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.