Using NLP to Generate MARC Summary Fields for Notre Dame ’s Catholic Pamphlets

Jeremiah Flannery

doi:10.23974/ijol.2020.vol5.1.158

Abstract

Three NLP (Natural Language Processing) automated summarization techniques were tested on a special collection of Catholic Pamphlets acquired by Hesburgh Libraries. The automated summaries were generated after feeding the pamphlets as .pdf files into an OCR pipeline. Extensive data cleaning and text preprocessing were necessary before the computer summarization algorithms could be launched. Using the standard ROUGE F1 scoring technique, the Bert Extractive Summarizer technique had the best summarization score. It most closely matched the human reference summaries. The BERT Extractive technique yielded an average Rouge F1 score of 0.239. The Gensim python package implementation of TextRank scored at .151. A hand-implemented TextRank algorithm created summaries that scored at 0.144. This article covers the implementation of automated pipelines to read PDF text, the strengths and weakness of automated summarization techniques, and what the successes and failures of these summaries mean for their potential to be used in Hesburgh Libraries.

Highlights

IntroductionOf all the information fields available in the MARC catalog, the summary field ranks near the top
Three NLP (Natural Language Processing) automated summarization techniques were tested on a special collection of Catholic Pamphlets acquired by Hesburgh Libraries
This article covers the implementation of automated pipelines to read PDF text, the strengths and weakness of automated summarization techniques, and what the successes and failures of these summaries mean for their potential to be used in Hesburgh Libraries

Summary

Introduction

Of all the information fields available in the MARC catalog, the summary field ranks near the top. The summary field is behind only the author and title fields when it comes to importance to patrons (Lundgren and Simpson 1999). Internal analysis (Unpublished Data, Kasten & Flannery, 2020) of English monographs showed that Notre Dame patrons check out records that include a summary field at higher frequencies, even when adjusted for estimated popularity of the monograph. When special collections are brought to the library catalog, there often isn’t the possibility to leverage existing summaries. In 2019, when the University of Notre Dame brought in a special collection of over 5500 Catholic Pamphlets, our department did not expend hundreds of hours (or more!) of staff time to read, let alone write summaries of the 5500 texts. Natural Language Processing (NLP) Summarization methods that utilize machine

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Librarianship	Publication Date: Jul 23, 2020
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Using NLP to Generate MARC Summary Fields for Notre Dame ’s Catholic Pamphlets

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Librarianship

Lead the way for us

Similar Papers

Comparison of text preprocessing methods
Christine P Chai
Natural Language Engineering | VOL. 29
Christine P ChaiChristine P Chai
13 Jun 2022
Natural Language Engineering | VOL. 29

Graduate Students and the Library
Jessica Kayongo ... Clarence Helm
Reference & User Services Quarterly | VOL. 49
Jessica Kayongo, et. al.Jessica Kayongo ... Clarence Helm
01 Jun 2010
Reference & User Services Quarterly | VOL. 49

Copy move and splicing forgery detection using deep convolution neural network, and semantic segmentation
Abhishek ... Neeru Jindal
Multimedia Tools and Applications | VOL. 80
Abhishek, et. al. Abhishek ... Neeru Jindal
22 Sep 2020
Multimedia Tools and Applications | VOL. 80

Land-Use Mapping with Multi-Temporal Sentinel Images Based on Google Earth Engine in Southern Xinjiang Uygur Autonomous Region, China
Riqiang Chen ... Guijun Yang
Remote Sensing | VOL. 15
Riqiang Chen, et. al.Riqiang Chen ... Guijun Yang
10 Aug 2023
Remote Sensing | VOL. 15

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Using NLP to Generate MARC Summary Fields for Notre Dame ’s Catholic Pamphlets

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Librarianship