Abstract

Recent Court decisions in the United States call for the empirical testing of language-based author identification techniques. This article shows the results of such testing. The tested hypotheses include: syntactic analysis, syntactically-classified punctuation, sentential complexity, vocabulary richness, readability, content analysis, spelling errors, punctuation errors, word form errors, and grammatical errors. These hypotheses are tested on a set of documents written by four women who are similar in age, educational level, and dialectal background: two of the women are Euro-American, and two are Afro-American. Each hypothesis is tested separately to determine its ability to differentiate documents from different authors and cluster documents from each author. Hypotheses which quantify linguistic features are tested statistically using the chi-square statistic. Discrimination error rates are calculated. Only two hypotheses successfully differentiate and cluster documents: syntactic analysis and syntactically-classified punctuation.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.