Abstract

Automatic classification of electronic records is necessary to address the brewing crisis in the recordkeeping discipline, caused by escalating data volumes and digital rights legislation. Current solutions usually employ expert systems that classify records based on their metadata, but this approach is becoming unfeasible due to the increased variety of records and a growing lack of metadata. Text classification is a promising alternative now that the records themselves are machine readable. In this study, the performance of traditional text classification techniques was compared to newer natural language processing technologies in a series of experiments using authentic records data. While the latest Transformer language models showed superior classification skill, traditional methods still perform well. These results were discussed by a focus group of record managers, who believe that text classification can help them manage risk and meet compliance obligations. This is a first step toward aspirations of being able to synthesize narrative from a corpus of records.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call