Abstract

The command line is a standard way of using the Linux operating system. It contains many features essential for efficiently handling data editing and analysis processes. Therefore, it is very useful in bioinformatics applications. Commands allow for rapid manipulation of large ASCII files or very numerous files, making basic command line programming skills a critical component in modern life science research. The following article is not a guide to Linux commands. In this manuscript, in contrast to many various Linux manuals, we aim to present basic command line tools helpful in handling biological sequence data. This manuscript provides a collection of simple and popular hacks dedicated to users with very basic experience in the area of the Linux command line. It includes a description of data formats and examples of editing of four types of data formats popular in bioinformatics applications.

Highlights

  • Basic programming skills are critical in life science research [1 - 5]

  • Numerous entry-level guides can be found on the internet; instead, we primarily focus on the interaction with GNU/Linux via the command line and present a few of the most popular tools useful in processing biological sequence data, especially in processing large-scale text files confirming that bash scripting language is appreciated for solving simple, everyday tasks in bioinformatics [8]

  • Since Bash programming skills are required to efficiently analyse and present sequence data, which may be a barrier for many researchers [9, 10], this manuscript provides a collection of simple and popular hacks dedicated to users with very basic experience in the area of the Linux command line

Read more

Summary

INTRODUCTION

Basic programming skills are critical in life science research [1 - 5]. Among computing environments used by scientists for years, the UNIX and the Linux operating systems have been used the most. Since Bash programming skills are required to efficiently analyse and present sequence data, which may be a barrier for many researchers [9, 10], this manuscript provides a collection of simple and popular hacks dedicated to users with very basic experience in the area of the Linux command line. It includes a description of elemental biological data formats and examples of how to manipulate this kind of data. 138 The Open Bioinformatics Journal, 2020, Volume 13 manuscript are available online (https://github.com/bczech/ bio-cli)

LINUX COMMAND LINE AVAILABILITY
SELECTED BIOLOGICAL DATA FORMATS
FASTA Format
FASTQ Format
Variant Effect Predictor Software Input and Output Formats
COMMAND LINE TOOLS
Example 1
Example 2
Example 3
Example 4
Example 5
Example 6
Example 7
Example 8
Example 9
BASH SCRIPTS
BIOAWK
Example 10
Example 11
CONCLUSION
Methods
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call