The nucleotide sequence of a Clostridium cellulolyticum endo-β-1,4-glucanase (EGCCA)-encoding gene ( celCCA) and its flanking regions, was determined. An open reading frame (ORF) of 1425 bp was found, encoding a protein of 475 amino acids (aa). This ORF began with an ATG start codon and ended with a TAA ochre stop codon. The N-terminal region of the EGCCA protein resembled a typical signal sequence of a Gram-positive bacterial extracellular protein. A putative signal peptidase cleavage site was determined. EGCCA, without a signal peptide, was found to be composed of more than 35% hydrophobic aa and to have an M r of 50715. Comparison of the encoded sequence with other known cellulase sequences showed the existence of various kinds of aa sequence homologies. First, a strong homology was found between the C-terminal region of EGCCA, containing a reiterated stretch of 24 aa, and the conserved reiterated region previously found to exist in four Clostridium thermocellum endoglucanases and one xylanase from the same organism. This region was suspected of playing a role in organizing the cellulosome complex. Second, an extensive homology was found between EGCCA and the N-terminal region of the large endoglucanase, EGE, from C. thermocellum, which suggests that they may have a common ancestral gene. Third, a region, which extended for 21 aa residues beginning at aa + 127, was found to be homologous with regions of cellulases belonging to Bacilli, Clostridia and Erwinia chrysanthemi.
Read full abstract