Inteins are genetic elements, inserted in-frame into protein-coding genes, whose products catalyze their removal from the protein precursor via a protein-splicing reaction. Intein domains can be split into two fragments and still ligate their flanks by a trans-protein-splicing reaction. A bioinformatic analysis of environmental metagenomic data revealed 26 different loci with a novel genomic arrangement. In each locus, a conserved enzyme coding region is broken in two by a split intein, with a free-standing endonuclease gene inserted in between. Eight types of DNA synthesis and repair enzymes have this ‘fractured’ organization. The new types of naturally split-inteins were analyzed in comparison to known split-inteins. Some loci include apparent gene control elements brought in with the endonuclease gene. A newly predicted homing endonuclease family, related to very-short patch repair (Vsr) endonucleases, was found in half of the loci. These putative homing endonucleases also appear in group-I introns, and as stand-alone inserts in the absence of surrounding intervening sequences. The new fractured genes organization appears to be present mainly in phage, shows how endonucleases can integrate into inteins, and may represent a missing link in the evolution of gene breaking in general, and in the creation of split-inteins in particular.
Read full abstract