Selenoprotein is biosynthesized by the incorporation of selenocysteine into proteins, where the TGA codon in the open reading frame does not act as a stop signal but is translated into selenocysteine. The dual functions of TGA result in mis-annotation or lack of selenoproteins in the sequenced genomes of many species. Available computational tools fail to correctly predict selenoproteins. Thus, we developed a new method to identify selenoproteins from the genome of Anopheles gambiae computationally. Based on released genomic information, several programs were edited with PERL language to identify selenocysteine insertion sequence (SECIS) element, the coding potential of TGA codons, and cysteine-containing homologs of selenoprotein genes. Our results showed that 11365 genes were terminated with TGA codons, 918 of which contained SECIS elements. Similarity search revealed that 58 genes contained Sec/Cys pairs and similar flanking regions around in-frame TGA codons. Finally, 7 genes were found to fully meet requirements for selenoproteins, although they have not been annotated as selenoproteins in NCBI databases. Deduced from their basic properties, the newly found selenoproteins in the genome of Anopheles gambiae are possibly related to in vivo oxidation tolerance and protein regulation in order to interfere with anopheles' vectorial capacity of Plasmodium. This study may also provide theoretical bases for the prevention of malaria from anopheles transmission.
Read full abstract