Abstract

With the rising popularity of user-generated genealogical family trees, new genealogical information systems have been developed. State-of-the-art natural question answering algorithms use deep neural network (DNN) architecture based on self-attention networks. However, some of these models use sequence-based inputs and are not suitable to work with graph-based structure, while graph-based DNN models rely on high levels of comprehensiveness of knowledge graphs that is nonexistent in the genealogical domain. Moreover, these supervised DNN models require training datasets that are absent in the genealogical domain. This study proposes an end-to-end approach for question answering using genealogical family trees by: (1) representing genealogical data as knowledge graphs, (2) converting them to texts, (3) combining them with unstructured texts, and (4) training a transformer-based question answering model. To evaluate the need for a dedicated approach, a comparison between the fine-tuned model (Uncle-BERT) trained on the auto-generated genealogical dataset and state-of-the-art question-answering models was performed. The findings indicate that there are significant differences between answering genealogical questions and open-domain questions. Moreover, the proposed methodology reduces complexity while increasing accuracy and may have practical implications for genealogical research and real-world projects, making genealogical data accessible to experts as well as the general public.

Highlights

  • The popularity of “personal heritage”, user-generated genealogical family tree creation, has increased in recent years, driven by new digital services, such as online family tree sharing sites, family tree creation software, and even self-service DNA analysis by companies like Ancestry and My Heritage

  • R performed by calculating the F1 score for a mix of random questions of all types

  • This finding indicates that the genealogy domain is distinctive and requires a dedicated training dataset and fine-tuned deep neural network (DNN) model

Read more

Summary

F Editors

PRO Solicited reviews: Ricardo Usbeck, University of Hamburg, Country; Isaiah Onando Mulang, IBM, Kenya; one anonymous reviewer

Introduction
Related work
Question answering using DNN
Sub-graph extraction and semantic representation
DQ enqueue n
Result
Datasets
F How many sisters did Emily have?
Results
E Overall
Conclusions and future work
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.