Abstract
Watch VIDEO. Since 2018, the French Open Science Monitor (BSO) has assessed the effectiveness of the national public policy in open science. This steering tool, developed by the French Ministry of Higher Education and Research, the University of Lorraine and Inria, measures the evolution of open science in France using reliable, open and controlled data updated every year. The result is a website presenting different dashboards, tracking for example the ratio of open access scientific publications by year, discipline or publisher. Since its last release in March 2023, the BSO also tracks the production and openness of research datasets and software mentioned in scientific publications on a national scale. To ensure a realistic coverage, our platform relies on large-scale open source Deep Learning techniques applied to the full texts of publications with at least one co-author with a French affiliation. DataStet identifies every mention of datasets in scholarly publications, including implicit mentions of datasets and explicitly named datasets. SoftCite recognizes any software mentions in scientific publications, using as training data the Softcite Dataset. Dataset and software mentions are then characterized automatically as used, created and shared by the research work described in the scientific document. These characterizations can be cumulative. Among 1,608,839 publications from our corpus, we were able to analyze 655,954 of them with our tool DataStet. For this subset, we found 6,511,998 mentions of datasets characterized as used, 330,062 mentions characterized as created, and 78,178 mentions characterized as shared. With this methodology, the BSO can offer new indicators about the proportion of French publications mentioning the usage, creation and sharing of data, as well as the proportion of publications in France that include a "Data Availability Statement". Similar indicators are dedicated to code and software. In addition, these indicators are further broken down into disciplines, publishers and institutions. The project is addressing major technical and organizational challenges: to identify French datasets and software without reference registries as for publications, thanks to artificial intelligence; to produce relevant indicators for the different scientific communities. As an enabling technology to identify research datasets and software, deep learning plays a crucial role. This presentation will be an opportunity to present the latest results of the project, to detail the methodology, and finally to underline the reusability of the project results.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.