Abstract

This paper enumerates SigTyP 2020 Shared Task on the prediction of typological features as performed by the KMI-Panlingua-IITKGP team. The task entailed the prediction of missing values in a particular language, provided, the name of the language family, its genus, location (in terms of latitude and longitude coordinates and name of the country where it is spoken) and a set of feature-value pair are available. As part of fulfillment of the aforementioned task, the team submitted 3 kinds of system - 2 rule-based and one hybrid system. Of these 3, one rule-based system generated the best performance on the test set. All the systems were ‘constrained’ in the sense that no additional dataset or information, other than those provided by the organisers, was used for developing the systems.

Highlights

  • This paper is a detailed documentation of the KMI-Panlingua-IITKGP team’s system submission at the SigTyP 2020 Shared Task on the prediction of typological features

  • The statistical system is an extension of our baseline system where the absence of both the language family and the genus in the training data is handled in a more principled way

  • All the 3 systems are based on the notion of shared typological properties of languages belonging to the same language family and shared areal properties of languages belonging to different language families but being in regular contact by virtue of being in close contact, mainly because of speakers residing in close geographical proximity

Read more

Summary

Introduction

This paper is a detailed documentation of the KMI-Panlingua-IITKGP team’s system submission at the SigTyP 2020 Shared Task on the prediction of typological features. The objective behind this task is to develop a computational model that predicts (missing) linguistic features of a language, given its location, language family, genus, and a set of feature-value pair. It provides an automatic system that enables extraction of feature-value pair of a given language - a tedious job if done manually It compares three different systems and provides evidence that a statistical model gives better results for the given data set

Dataset
Statistical System
Step 1
Step 2
Step 3
Hybrid System
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.