FIG. 1.—Summary of the structure and coding sequence of the human Gin-1 gene. Sequences of human cDNAs with accession numbers XMp003947.2 (a putative full-length cDNA), BE502574, AW173201.1, AW950418.1, AI631948.1, and AA766836.1 were used to deduce and confirm these data. The full-length protein is 522 amino acids long. The Gin-1 coding region spans nucleotides 36153–15345 in the genomic clone NTp002663.4. Arrowheads and the numbers above them, respectively, indicate the positions and lengths of introns. Several Alu repeats were detected within the two largest introns. Bold letters indicate the region homologous to the most conserved part of the IN domain, detailed in figure 2 and used to obtain the tree shown in figure 3. Amino acids characteristic of the H2C2 and DDE motifs (Khan et al. 1991) and of the most conserved region of the GPY/F module (Malik and Eickbush 1999) are underlined. Ty3/Gypsy long-terminal-repeat (LTR) retrotransposons are among the best-known transposable elements. They inhabit the genomes of many eukaryotic organisms, such as slime molds, plants, fungi, and animals, including vertebrates (Xiong and Eickbush 1990; Malik and Eickbush 1999; Miller et al. 1999; Marin and Llorens 2000). However, in spite of extensive genomic information, these elements had never been found in mammals. In the process of building a database of integrase domain (IN) sequences, we found an intriguing human sequence very similar to the IN of Ty3/Gypsy elements. It was particularly similar to the IN of the Drosophila melanogaster 412 element (E value 5 10227). The sequence of the human gene, which we called Gypsy integrase-1, or Gin-1, was reconstructed by combining information from genomic and cDNA sequences present in the National Center for Biotechnology Information databases (online at http://www.ncbi.nlm.nih.gov/; sequences in TIGR and Sanger Center databases did not provide additional information). Partial mouse, rat, and cow orthologous cDNAs were also detected. Moreover, an apparently full-length mouse cDNA sequence (accession number AK015243) was also found. However, the corresponding genomic sequences are not yet available for any of these other mammalian species. Figure 1 summarizes the structure and describes the protein encoded by the human Gin-1 gene. It has the characteristic H2C2, DDE, and GPY/F motifs found in many retroviral and retrotransposon integrases (Khan et al. 1991; Malik and Eickbush 1999). Homology to IN of Ty3/Gypsy elements spans the whole protein sequence (fig. 2), strongly suggesting that GIN-1 is also, and exclusively, an integrase. The similarity between the human and mouse genes is the expected similarity for orthologous genes of these two species. Amino acidic identity is 446/552 5 85%, while a comparative analysis of 1,138 mouse/human orthologs estimated an average identity in their coding regions of 86.4% (Makalowski and Boguski 1998). Phylogenetic analyses using IN sequences of the known clades of Ty3/Gypsy elements (Malik and Eickbush 1999), as well as of representative Ty1/Copia elements and retroviruses, confirmed that the putative integrase encoded by Gin-1 is very similar to Mdg1 clade elements IN (this clade so far includes D. melanogaster 412 and Mdg1; Malik and Eickbush 1999). Their sequences form a strongly supported monophyletic group