Large-scale random cDNA sequencing projects have been started for several organisms and are a valuable tool for the analysis of quantitative and qualitative aspects of gene expression. However, the reliability of the obtained data is limited as most of the clones are only partially analysed on one strand. As a consequence the sequence entries derived from random cDNA sequencing projects usually comprise incomplete open reading frames. They nevertheless define complete and reliable coding sequences, if two prerequisites are fulfilled: (i) the clones encode very small proteins, and (ii) the clones have a high frequency in the cDNA-banks. The present study describes the use of cDNA databases for the identification of homologues of three low-molecular-weight subunits of the mitochondrial bc1 complex, termed the QCR6, QCR9 and QCR10 proteins. These polypeptides are only characterized for a small number of organisms, have a scarcely defined function and exhibit a low degree of structural conservation if compared between different species. Several clones were identified for each polypeptide by searches with TBLASTN using the known sequences as probes. Most of the database entries contain complete open reading frames and sequencing queries could be excluded due to the abundancy of the clones. Multiple sequence alignments are presented for all three polypeptides and consensus sequences are given which may provide a basis for the investigation of the proteins by site-directed mutagenesis.
Read full abstract