Abstract

Research datasets include all kinds of objects, from web pages to sensor data, and originate in every domain. Concerns with data generated in large projects and well-funded research areas are centered on their exploration and analysis. For data in the long tail, the main issues are still how to get data visible, satisfactorily described, preserved, and searchable.
 Our work aims to promote data publication in research institutions, considering that researchers are the core stakeholders and need straightforward workflows, and that multi-disciplinary tools can be designed and adapted to specific areas with a reasonable effort. For small groups with interesting datasets but not much time or funding for data curation, we have to focus on engaging researchers in the process of preparing data for publication, while providing them with measurable outputs. In larger groups, solutions have to be customized to satisfy the requirements of more specific research contexts.
 We describe our experience at the University of Porto in two lines of enquiry. For the work with long-tail groups we propose general-purpose tools for data description and the interface to multi-disciplinary data repositories. For areas with larger projects and more specific requirements, namely wind infrastructure, sensor data from concrete structures and marine data, we define specialized workflows. In both cases, we present a preliminary evaluation of results and an estimate of the kind of effort required to keep the proposed infrastructures running. 
 The tools available to researchers can be decisive for their commitment. We focus on data preparation, namely on dataset organization and metadata creation. For groups in the long tail, we propose Dendro, an open-source research data management platform, and explore automatic metadata creation with LabTablet, an electronic laboratory notebook. For groups demanding a domain-specific approach, our analysis has resulted in the development of models and applications to organize the data and support some of their use cases. Overall, we have adopted ontologies for metadata modeling, keeping in sight metadata dissemination as Linked Open Data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.