Secondary structure is a principal determinant of lncRNA function, predominantly regarding scaffold formation and interfaces with target molecules. Noncanonical secondary structures that form in nucleic acids have known roles in regulating gene expression and include G-quadruplexes (G4s), intercalated motifs (iMs), and R-loops (RLs). In this paper, we used the computational tools G4-iM Grinder and QmRLFS-finder to predict the formation of each of these structures throughout the lncRNA transcriptome in comparison to protein-coding transcripts. The importance of the predicted structures in lncRNAs in biological contexts was assessed by combining our results with publicly available lncRNA tissue expression data followed by pathway analysis. The formation of predicted G4 (pG4) and iM (piM) structures in select lncRNA sequences was confirmed in vitro using biophysical experiments under near-physiological conditions. We find that the majority of the tested pG4s form highly stable G4 structures, and identify many previously unreported G4s in biologically important lncRNAs. In contrast, none of the piM sequences are able to form iM structures, consistent with the idea that RNA is unable to form stable iMs. Unexpectedly, these C-rich sequences instead form Z-RNA structures, which have not been previously observed in regions containing cytosine repeats and represent an interesting and underexplored target for protein-RNA interactions. Our results highlight the prevalence and potential structure-associated functions of noncanonical secondary structures in lncRNAs, and show G4 and Z-RNA structure formation in many lncRNA sequences for the first time, furthering the understanding of the structure-function relationship in lncRNAs.
Read full abstract