Abstract Information encryption based on DNA data archiving, referred to as DNA encryption, has been advocated for decades and has become highly appealing owing to its remarkable advantages, e.g., high storage capacity, complexity and programmability. Early DNA encryption schemes primarily leveraged the natural four-letter genetic alphabet for data storage, with message-storing DNA sequences easily decrypted by routine DNA sequencing, which is consequently vulnerable to attack and faces severe security challenges. Here, an unnatural base pair (UBP), dNaM-dTPT3, was introduced into the message and/or index DNA sequences, which can be stored either in vitro or in vivo; this approach achieved the bioorthogonal encryption of “secret” messages, where message DNAs could be selectively, faithfully and readily retrieved or read exclusively in the presence of unnatural bases. Furthermore, a separative computational algorithm, named IM-Codec, was developed to encrypt the data into a “key sequence” (KS) and an “information sequence” (IS) through UBP insertion. Finally, a UBP-based multilevel DNA encryption approach was developed and validated for data encryption and decryption. The employment of the UBP expanded genetic system for data encryption should provide valuable solutions for archiving highly confidential data and thus usher in a new era of DNA encryption.
Read full abstract