SEquence Evaluation through k-mer Representation (SEEKR) is a method of sequence comparison that uses sequence substrings called k-mers to quantify the nonlinear similarity between nucleic acid species. We describe the development of new functions within SEEKR that enable end-users to estimate P-values that ascribe statistical significance to SEEKR-derived similarities, as well as visualize different aspects of k-mer similarity. We apply the new functions to identify chromatin-enriched lncRNAs that contain XIST-like sequence features, and we demonstrate the utility of applying SEEKR on lncRNA fragments to identify potential RNA-protein interaction domains. We also highlight ways in which SEEKR can be applied to augment studies of lncRNA conservation, and we outline the best practice of visualizing RNA-seq read density to evaluate support for lncRNA annotations before their in-depth study in cell types of interest.
Read full abstract