Abstract

The creation and dissemination of reproducible research is receiving ever-growing attention in discussions on best practices in publication and education. A key element of these practices is appropriate citation of data sources. In this presentation we describe one scholar-led initiative to increase awareness of the value of data citation in scholarly communication across the discipline of linguistics. 
 Practices in linguistics are varied; it is primarily a data-driven social science, in which inferences about the properties of language, human cognition, cultures and societies are drawn from observations of language. The primary data sets underlying the field are records of these observations in the form of, for instance, texts, audio/video recordings and annotations. While linguists have always relied on language data, they have not always facilitated access to those data in publications (Berez-Kroeker et al. 2018). A great deal of published linguistic research is therefore not reproducible, either in principle or in practice.
 A primary factor hindering reproducible research in linguistics is the lack of standards for data citation in scholarly publishing. Lacking such standards, the field continues to emphasize linguistic analyses over linguistic data, and as a result, linguists have little incentive to make the data behind research publications accessible.
 Funded by the US National Science Foundation, since 2015 we have endeavored to develop and promote standards for citing data. We are an international (Norway, US, Canada, Australia) team of scholars including linguistic data practitioners, scholarly communication librarians, and digital archivists.
 In this presentation we discuss our coordinated efforts over the past four years, including:
 Network building
 
 3 international workshops to identify technical and sociological barriers to research data citation in linguistics publications;
 The formation of the Linguistics Data Interest Group (https://rd-alliance.org/groups/linguistics-data-ig) within the Research Data Alliance, with nearly 100 members from the international linguistics scholarly community.
 
 Outreach activities
 
 Short-form technical courses and presentations offered through the Linguistic Society of America.
 
 Deliverable products
 
 An open-access position paper (Berez-Kroeker et al. 2018).
 The Austin Principles of Data Citation in Linguistics (http://linguisticsdatacitation.org), which annotates the FORCE11 Joint Declaration of Data Citation Principles (Data Citation Synthesis Group 2014) for linguistic scholarship.
 Guidelines for citing linguistic data to be shared in late 2019 with linguistics journal editors and stylesheet curators.
 The open-access Open Handbook of Linguistic Data Management (MIT Press Open, est. publication date 2020). 
 
 With this presentation, we aim to encourage practitioners in other fields to initiate similar advancements, and to encourage decision-makers and publishers to actively collaborate with and support scholar-led initiatives working toward better research practices.

Highlights

  • BackgroundBetter: Data-driven social science in which inferences about cognition and social structure are drawn from observations of language use

  • Linguistics and linguistic data Our project: Network building, deliverables and outreach activities What we have learned

  • ! Data in publications don’t generally have citations If they do, citation only vaguely linked to the actual data set, making reproducible research very hard

Read more

Summary

Background

Language data are precious: Captures world-views Captures cultures at a certain point in time, and their contact over time with each other Captures cognitive capacities and variation (grammar, acoustic properties). Work at Plenaries and virtually to develop deliverables aimed at linguistic researchers. Set of guidelines to help linguists to make informed decisions regarding the accessibility and transparency of their research data. 13 chapters on conceptual foundations of data management for linguistics and best practices. Recommendations for citation of research data in linguistics (working title) Citation model for in-text citations and bibliographic references, including commented examples and elaborated definitions. Intended audience: Editors of linguistic publications, researchers, and repositories. PhD summer school on corpus phonology in Lausanne (Switzerland) -Treatment of acoustic data from A to Z, including transparency of research and best practices of research data management. - For many repositories, the best practices of data citation are not reflected in the metadata and documentation guidelines. Continuous outreach seems to move things (slowly) forward - Concrete deliverables are key - Right context, looking for opportunities for outreach - Enough time for presentation, Q&A - Getting the right people on board, trend-setters in the community

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call