Abstract
Scientific datasets have immeasurable value, but they lose their value over timewithout proper documentation, long-term storage, and easy discovery and access.Across disciplines as diverse as astronomy, demography, archeology, and ecology,large numbers of small heterogeneous datasets (i.e., the long tail of data) are especially at risk unless they are properly documented, saved, and shared. One unifyingfactor for many of these at-risk datasets is that they reside in spreadsheets. In response to this need, the California Digital Library (CDL) partnered withMicrosoft Research Connections and the Gordon and Betty Moore Foundation tocreate the DataUp data management tool for Microsoft Excel. Many researcherscreating these small, heterogeneous datasets use Excel at some point in their datacollection and analysis workflow, so we were interested in developing a data management tool that fits easily into those work flows and minimizes the learning curvefor researchers. The DataUp project began in August 2011. We first formally assessed theneeds of researchers by conducting surveys and interviews of our target research groups: earth, environmental, and ecological scientists. We found that, on average, researchers had very poor data management practices, were not aware of datacenters or metadata standards, and did not understand the benefits of data management or sharing. Based on our survey results, we composed a list of desirablecomponents and requirements and solicited feedback from the community to prioritize potential features of the DataUp tool. These requirements were then relayedto the software developers, and DataUp was successfully launched in October 2012.
Highlights
The move towards digital data is ubiquitous across all domains in academic research and scholarship[1,2,3,4,5], and these data can be made available more and distributed more quickly than ever before
Among the most pressing problems associated with the data deluge is good data management: how does one handle the huge volume of available information effectively and efficiently to solve important problems? Knowledge of good data management techniques and software development lags behind the progression of the data deluge
Michener et al.[14] described the loss of valuable data and insight about those datasets as “information entropy”. This loss of information is becoming increasingly worrisome as data management practices improve very slowly, while the volume of data grows exponentially
Summary
The move towards digital data is ubiquitous across all domains in academic research and scholarship[1,2,3,4,5], and these data can be made available more and distributed more quickly than ever before. Our vision was to promote publishing, archiving, and sharing of tabular data among earth, environmental, oceanographic, and ecological scientists by creating a tool that will integrate into their current workflows and assist them in data management and preservation. This will, in turn, enable faster and more efficient research, thereby increasing the pace of scientific advancement. The resulting DataUp tool facilitates documenting, managing, archiving, and sharing tabular scientific data It comes in two forms, both open-source: an add-in for Excel and a web-based application. Both the add-in and the web application provide users with the ability to (1) Perform a “best practices check” to ensure the data are CSV-compatible; (2) Create standardized metadata, or a description of the data, using a wizard-style template; (3) Retrieve a unique identifier for their dataset from their chosen data repository, and (4) Post their datasets and associated metadata to the repository
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.