Abstract

BackgroundNext-generation sequencing (NGS) has transformed the life sciences, and many research groups are newly dependent upon computer clusters to store and analyze large datasets. This creates challenges for e-infrastructures accustomed to hosting computationally mature research in other sciences. Using data gathered from our own clusters at UPPMAX computing center at Uppsala University, Sweden, where core hour usage of ∼800 NGS and ∼200 non-NGS projects is now similar, we compare and contrast the growth, administrative burden, and cluster usage of NGS projects with projects from other sciences.ResultsThe number of NGS projects has grown rapidly since 2010, with growth driven by entry of new research groups. Storage used by NGS projects has grown more rapidly since 2013 and is now limited by disk capacity. NGS users submit nearly twice as many support tickets per user, and 11 more tools are installed each month for NGS projects than for non-NGS projects. We developed usage and efficiency metrics and show that computing jobs for NGS projects use more RAM than non-NGS projects, are more variable in core usage, and rarely span multiple nodes. NGS jobs use booked resources less efficiently for a variety of reasons. Active monitoring can improve this somewhat.ConclusionsHosting NGS projects imposes a large administrative burden at UPPMAX due to large numbers of inexperienced users and diverse and rapidly evolving research areas. We provide a set of recommendations for e-infrastructures that host NGS research projects. We provide anonymized versions of our storage, job, and efficiency databases.

Highlights

  • Next-Generation Sequencing (NGS) has transformed the life sciences and many research groups are newly dependent upon computer clusters to store and analyse large datasets

  • Jobs; some efficiency can be gained with user education, but some progress is not possible due to lack of maturity in NGS. We discuss these in more detail below, and conclude with recommendations for high-performance computing (HPC) clusters hosting NGS research computing projects

  • Hosting NGS research carries a large administrative burden, with increased effort arising from management of core hour and storage allocations, user support tickets, and software installations

Read more

Summary

Introduction

Next-Generation Sequencing (NGS) has transformed the life sciences and many research groups are newly dependent upon computer clusters to store and analyse large datasets. This creates challenges for e-infrastructures accustomed to hosting computationally mature research in other sciences. Many life science researchers need to become comfortable with commandline interaction with Linux operating systems and researchoriented software tools, a major change in expectations in comparison to just a few years ago This contrasts strongly with expectations in research fields that have a longer history of HPC usage, such as physics, computational chemistry or climate science research, in which the general computational sophistication of researchers and the maturity of software tools are both considerably higher [e.g., 10]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call