Abstract
Abstract Large language models are tools with great potential for text processing. This study aims to assess the reliability of the models’ results in extracting structured knowledge from unstructured textual sources, particularly biographies from the Polish Biographical Dictionary. The task of the model was to extract information about the individuals, such as date and place of birth, death and burial, family relationships, important people, related settlements and institutions as well as occupied positions. The test was conducted on a sample of 250 biographies. The texts were written in Polish from the 1930s onwards and described the lives of individuals from various historical periods. The results show that the large language model (LLM) is very effective in identifying basic personal data, important family relationships, occupations, or offices held by the characters. Weaker results were obtained when attempting to find institutions and places associated with the protagonists. The outcome of the test suggests that LLMs can efficiently assist in digitizing and structuring historical biographical data and offer a promising tool for improving historical knowledge bases and speeding up the work compared to manual extraction of information.
Published Version
Join us for a 30 min session where you can share your feedback and ask us any queries you have