Abstract Motivation Accurate sequence length profiling is essential in bioinformatics, particularly in genomics and proteomics. Existing tools like SeqKit and the Trinity toolkit provide basic sequence statistics but often fall short in offering comprehensive analytics and plotting options. For instance, SeqKit is a very complete and fast tool for sequence analysis, delivering useful metrics (e.g., number of sequences, average, minimum, and maximum lengths) and can return sequences either shorter or longer (but not both at once) for a given length. Similarly, Trinity's Perl-based scripts provide detailed contig length distributions (e.g., N50, median, and average lengths) but do not include the total number of sequences or offer graphical representations of the data. Results Given that key sequence analysis tasks are often distributed across multiple tools, we introduce SeqLengthPlot v2.0, an all-in-one, easy-to-use Python-based tool. Through a simple command-line interface, this straightforward tool enables users to split input FASTA files (nucleotide and protein) into two distinct files based on a customizable sequence length cutoff. It also automatically retrieves the resulting FASTA files, generates length distribution plots, and provides comprehensive statistical summaries. Availability and implementation SeqLengthPlot_v2.0.2 can be accessed at https://github.com/danydguezperez/SeqLengthPlot/releases/tag/v2.0.2.
Read full abstract