Abstract

The goal of the Langues et Civilisations à Tradition Orale (LACITO) Linguistic Archive project is to conserve and disseminate recorded and transcribed oral literature and other linguistic materials, mainly in unwritten languages, giving simultaneous access to sound recordings and text annotation. The project uses XML markup for the kinds of annotation traditionally used in field linguistics. Transcriptions are segmented into sentences (roughly) and words. Annotations are associated with different levels: metadata at the text level, free translation at the sentence level, interlinear glosses at the word level, etc. Time-alignment is at the sentence and optionally at the word level. The project makes maximum use of standard, generic software tools. Marked-up data are processed using freely available XML software and displayed using standard browsers. The project has developed (1) an authoring tool, SoundIndex, to facilitate time-alignment, (2) a Java applet, which enables browsers to access time-aligned speech, (3) XSL stylesheets, which specify “views” on the data, and (4) Common Gateway Interface (CGI) scripts, which allow the user to choose documents and views and to enter queries. Current objectives include development of the annotation and software to facilitate linguistic research beyond simple browsing. Over 100 texts in 20 languages have been processed at the time of writing; some of these are available on the Internet for browsing and simple querying.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.