Abstract
Simple SummaryBombyx mori (B. mori), an important economic insect, is famous for its silk. B. mori silk is mainly composed of silk fibroin coated with sericin. Among them, the silk fibroin heavy chain protein has the highest content and the largest molecular weight, which is encoded by the silk fibroin heavy chain (FibH) gene. At present, apart from the complete sequence of the FibH of the B. mori strain p50T, there are no other reports regarding this protein. This is mainly because the special structure formed by the GC-rich repetitive sequence in FibH hinders the amplification of polymerase and the application of Sanger sequencing. Here, the FibH sequence of Dazao, which has 99.98% similarity to that of p50T, was obtained by means of CEO. As far as we know, this is the first complete FibH sequence of the Chinese B. mori strain. Additionally, the methylated CG sites in the FibH repeat unit were identified.To study the evolution of gene function and a species, it is essential to characterize the tandem repetitive sequences distributed across the genome. Cas9-based enrichment combined with nanopore sequencing is an important technique for targeting repetitive sequences. Cpf1 has low molecular weight, low off-target efficiency, and the same editing efficiency as Cas9. There are numerous studies on enrichment sequencing using Cas9 combined with nanopore, while there are only a few studies on the enrichment sequencing of long and highly repetitive genes using Cpf1. We developed Cpf1-based enrichment combined with ONT sequencing (CEO) to characterize the B. mori FibH gene, which is composed of many repeat units with a long and GC-rich sequence up to 17 kb and is not easily amplified by means of a polymerase chain reaction (PCR). CEO has four steps: the dephosphorylation of genomic DNA, the Cpf1 targeted cleavage of FibH, adapter ligation, and ONT sequencing. Using CEO, we determined the fine structure of B. mori FibH, which is 16,845 bp long and includes 12 repetitive domains separated by amorphous regions. Except for the difference of three bases in the intron from the reference gene, the other sequences are identical. Surprisingly, many methylated CG sites were found and distributed unevenly on the FibH repeat unit. The CEO we established is an available means to depict highly repetitive genes, but also a supplement to the enrichment method based on Cas9.
Highlights
Tandem repeat is a common repetitive DNA sequence in eukaryotes, which are arranged consecutively in the genome, and has been called junk or even selfish DNA because most of these DNA sequences cannot encode proteins [1,2]
We established an enrichment sequencing method based on Cpf1 combined with Oxford Nanopore Technologies (ONT) for genes with low complexity and high repetition
Using Cpf1-based enrichment and ONT sequencing (CEO), we described the fine structure and methylation modification of the highly repetitive fibroin heavy chain protein (FibH) gene
Summary
Tandem repeat is a common repetitive DNA sequence in eukaryotes, which are arranged consecutively in the genome, and has been called junk or even selfish DNA because most of these DNA sequences cannot encode proteins [1,2]. These “junk” DNAs are not useless, and they play a very important role in nucleus formation, chromatin rearrangement, tumorigenesis, and gene expression regulation [3,4,5,6,7]. Variation in the number of repeat units in protein-coding genes determines gene polymorphism and affects the function of the gene Such genes often have large variations in length and a complex structure, often making it difficult for them to amplify or conduct short-read sequencing [18]. Enriching or cloning target DNA fragments before ONT sequencing would efficiently solve this problem
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have