In this chapter, we discuss the potential application of Restricted Boltzmann machines (RBM) to model sequence families of structured RNA molecules. RBMs are a simple two-layer machine learning model able to capture intricate sequence dependencies induced by secondary and tertiary structure, as well as mechanisms of structural flexibility, resulting in a model that can be successfully used for the design of allosteric RNA such as riboswitches. They have recently been experimentally validated as generative models for the SAM-I riboswitch aptamer domain sequence family. We introduce RBM mathematically and practically, providing self-contained code examples to download the necessary training sequence data, train the RBM, and sample novel sequences. We present in detail the implementation of algorithms necessary to use RBMs, focusing on applications in biological sequence modeling.
Read full abstract