Abstract

We address the problem of automatically detecting text that meets the deliberative process privilege as defined by the U.S. Freedom of Information Act. A recent study in the ACM J. Comput. Cult. Herit. describes an effort to create an annotated corpus wherein each paragraph was manually labeled as to whether it met this privilege. The authors tested Support Vector Machine and Logistic Regression classifiers using simple word-count-based features. We implement these classifiers as well as expanded versions of them resulting from the inclusion of more linguistically complex features. After removal of certain elements of the original corpus, we carry out experiments and observe a significant increase in classifier correctness when these features are used in conjunction with simple word-count-based features. We also implement a BERT-based classifier and observe a further improvement.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call