Long non-coding RNAs (lncRNAs) constitute a poorly studied class of transcripts with emerging roles in key cellular processes. Despite efforts to characterize lncRNAs across a wide range of species, these molecules remain largely unexplored in most eukaryotic microbes, including yeast pathogens of the Candida clade. Here, we analyze thousands of publicly available sequencing datasets to infer and characterize the lncRNA repertoires of five major Candida pathogens: Candida albicans, Candida tropicalis, Candida parapsilosis, Candida auris and Candida glabrata. Our results indicate that genomes of these species encode hundreds of lncRNAs that show levels of evolutionary constraint intermediate between those of intergenic genomic regions and protein-coding genes. Despite their low sequence conservation across the studied species, some lncRNAs are syntenic and are enriched in shared sequence motifs. We find co-expression of lncRNAs with certain protein-coding transcripts, hinting at potential functional associations. Finally, we identify lncRNAs that are differentially expressed during infection of human epithelial cells for four of the studied species. Our comprehensive bioinformatic analyses of Candida lncRNAs pave the way for future functional characterization of these transcripts.
Read full abstract