Abstract

Mapping of reads to transcriptomes is a crucial initial step for bioinformatics RNA-seq pipelines. As alignment-based methods exhibit high computational complexities, lightweight alignment-free methods are becoming increasingly important. We present RNACache – a novel approach to the detection of local similarities between transcriptomes and RNA-seq reads based on context-aware locality sensitive hashing. It consists of a three-step processing pipeline consisting of subsampling of k-mers, match-based (online) filtering, and coverage-based filtering in order to identify truly expressed transcript isoforms. Our performance evaluation shows that RNACache produces transcriptomic mappings of high accuracy that include significantly fewer erroneous matches compared to the state-of-the-art lightweight mappers RapMap, Salmon, and Kallisto. Furthermore, it offers good scalability in terms of number of utilized CPU cores and has the best runtime performance at low memory consumption on modern multi-core workstations. This is an extended version of our previously published conference paper (Cascitti et al., 2021). RNACache is available at https://github.com/jcasc/rnacache.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call