Walking the Talk: Adopting and Adapting Sustainable Scientific Software Development processes in a Small Biology Lab.

Michael R Crusoe,C Titus Brown

doi:10.5334/jors.35

Abstract

The khmer software project provides both research and production functionality for largescale nucleic-acid sequence analysis. The software implements several novel data structures and algorithms that perform data pre-fltering for common bioinformatics tasks, including sequence mapping and de novo assembly. Development is driven by a small lab with one full-time developer (MRC), as well as several graduate students and a professor (CTB) who contribute regularly to research features. Here we describe our efforts to bring better design, testing, and more open development to the khmer software project as of version 1.1. The khmer software is developed openly at http://github.com/dib-lab/khmer/.

Highlights

The khmer software was born from a need to more scalably analyze short fixed-length (20–30 character) words, or “k-mers”, in large DNA sequencing data sets
As data sets have grown in size, approaches to analyzing k-mers have fallen behind the memory and compute scaling curves. khmer provides several functions: approximate k-mer counting using a CountMin Sketch [10], an implementation of a compressible k-mer connectivity graph [8], and a streaming lossy compression algorithm for large data sets [2]
We developed the khmer software as an open source project since the beginning: the software is under the BSD license, and we use GitHub for most development activities, including co-ordinating contributions, performing code review, and tagging releases

Summary

Introduction

The khmer software was born from a need to more scalably analyze short fixed-length (20–30 character) words, or “k-mers”, in large DNA sequencing data sets. Khmer provides several functions: approximate k-mer counting using a CountMin Sketch [10], an implementation of a compressible k-mer connectivity graph [8], and a streaming lossy compression algorithm for large data sets [2] These were first implemented as a part of bioinformatics research publications, but due to their broad utility have been used in several hundred data analysis publications. The main challenge for us in developing khmer has been to build a stable and reliable software project while simultaneously supporting an energetic research program in bioinformatics. This has traditionally been hard for small scientific labs due to many factors including lack of expertise and lack of sustained funding.

Objectives

Findings

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of open research software	Publication Date: Nov 29, 2016
Citations: 7	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Walking the Talk: Adopting and Adapting Sustainable Scientific Software Development processes in a Small Biology Lab.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of open research software

Lead the way for us

Similar Papers

Automated DNA Sequencing and Analysis
...
-
, et. al. ...
01 Jan 1993
01 Jan 1993

Large-scale transcriptome sequencing and gene analyses in the crab-eating macaque (Macaca fascicularis) for biomedical research.
Jae-Won Huh ... Ji-Su Kim
BMC Genomics | VOL. 13
Jae-Won Huh, et. al.Jae-Won Huh ... Ji-Su Kim
04 May 2012
BMC Genomics | VOL. 13

A large-scale whole-genome sequencing analysis reveals false positives of bacterial essential genes
Yuanhao Li ... Bo Jiang
Applied Microbiology and Biotechnology | VOL. 106
Yuanhao Li, et. al.Yuanhao Li ... Bo Jiang
10 Dec 2021
Applied Microbiology and Biotechnology | VOL. 106

Abstract 1113: The comprehensive mutational profiling of lung adenocarcinoma through large-scale RNA sequencing analysis.
Won-Chul Lee ... Hwanseok Rhee
Cancer Research | VOL. 73
Won-Chul Lee, et. al.Won-Chul Lee ... Hwanseok Rhee
15 Apr 2013
Cancer Research | VOL. 73

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Walking the Talk: Adopting and Adapting Sustainable Scientific Software Development processes in a Small Biology Lab.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of open research software