Abstract

Data Science is more than just analysis and application of statistical techniques. In the life sciences, data scientists play a huge part in the day-to-day running of a laboratory. As Figure 1 illustrates, data scientists have to accomplish many tasks, ranging from the initial experiment planning to ensuring final dissemination of research to the outside world in the most efficient ways. In times where ”open”, ”reproducible”, ”linked” and ”meaningful” are terms often mooted by the scientific community, it can be daunting for those science explorers to navigate the seas of information exchange and annotation requirement. For almost every domain of life sciences, specific exchange formats, ontologies, minimal information checklists, databases, data management tools and analysis packages exist. Yet, awareness among scientists remains limited and the lack of adequate software support to enable compliance adds to the tedium of organizing findings and supporting evidence in a form suited for all but the best publication venues. In the present work, we discuss the potential pipeline of a data scientist, the challenges each part in that pipeline brings and solutions to those problems. These solutions have been created by the authors and their collaborators from a larger community, the ISA commons, to aid scientists in not only getting their job done, but in creating data and metadata that is: reproducible; meaningful (to humans and machines); and open to the wider research community for later re-use.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call