Can AI Models Summarize Your Diary Entries? Investigating Utility of Abstractive Summarization for Autobiographical Text

Shamane Siriwardhana,Chitralekha Gupta,Tharindu Kaluarachchi,Vipula Dissanayake,Suveen Ellawela,Suranga Nanayakkara

doi:10.1080/10447318.2023.2286090

Abstract

Journaling is a widely adopted technique, known to improve mental health and well-being by enabling reflection on past events. Large amounts of text in digital journaling applications could hinder the reflection process due to information overload. ive summarization can solve this problem by generating short summaries to quickly glance at and reminisce. In this paper, we present an investigation of the utility of large language models in the context of autobiographical text summarization. We study two approaches to adapt a self-supervised learning (SSL) model to the domain of autobiographical text. One model employs transfer learning using our new autobiographical text summary dataset to fine-tune the SSL model. The second model leverages existing news datasets for high-quality text summarization mixed with our autobiographical summary dataset. We conducted mixed methods research to analyze the performance of these two models. Through objective evaluation using ROUGE and BART scores, we find that both these approaches perform significantly better than the SSL model fine-tuned with only high-quality news datasets, showing the importance of domain adaptation and autobiographical text summary dataset for this task. Secondly, through a subjective evaluation on a crowd-sourcing platform, we evaluated the summaries generated from these models on various quality criteria such as grammar, non-redundancy, structure, and coherence. We found that on all criteria, these summaries score >4 out of 5, and the two models show comparable results. We deployed a proof-of-concept web-based journaling application to assess the practical real-world implications of incorporating abstractive summarization in a digital journaling context. We found that the participants showed a high consensus that the summaries generated by the system captured the main idea of their journal entry (80% of the 75 participants gave a Likert scale rating of out of 7.0, with the overall mean rating of 5.56 ± 1.32) while being factually correct, and they found it to be a useful feature of a journaling application. Finally, we conducted human evaluation studies to compare the quality of the summaries generated from a commercial tool ChatGPT and mixed distribution fine-tuned SSL model, and present insights into these systems in the context of autobiographical abstractive text summarization. We have made our model, dataset, and subjective evaluation questionnaire openly available to the research community.

Full Text