Abstract

This paper offers a solution for the automatic retrieval of topic-specific discussions from .pdf documents. The suggested algorithm's main contribution consists of isolating topical discussions independently of the document's structure and disclosure location within the .pdf. We demonstrate this property by exploring corporate social responsibility (CSR) reporting that varies considerably across companies and countries. Our final successful extraction rate is calculated based on a randomly selected 50 annual reports where human readers identified CSR discussions. The final percentage of retrieval exceeds 90 %. Statistical validation of this approach also confirms capturing the underlying CSR construct by its high correlation with a CSR performance rating.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call