Organization of protein sequences into domain families is a foundation for cataloging and investigating protein functions. However, long-standing strategies based on primary amino acid sequences are blind to the possibility that proteins with dissimilar sequences could have comparable tertiary structures. Building on our recent findings that in silico structural predictions of BEN family DNA-binding domains closely resemble their experimentally determined crystal structures, we exploited the AlphaFold2 database for comprehensive identification of BEN domains. Indeed, we identified numerous novel BEN domains, including members of new subfamilies. For example, while no BEN domain factors had previously been annotated in C.elegans, this species actually encodes multiple BEN proteins. These include key developmental timing genes of orphan domain status, sel-7 and lin-14, the latter being the central target of the founding miRNA lin-4. We also reveal that the domain of unknown function 4806 (DUF4806), which is widely distributed across metazoans, is structurally similar to BEN and comprises a new subtype. Surprisingly, we find that BEN domains resemble both metazoan and non-metazoan homeodomains in 3D conformation and preserve characteristic residues, indicating that despite their inability to be aligned by conventional methods, these DNA-binding modules are probably evolutionarily related. Finally, we broaden the application of structural homology searches by revealing novel human members of DUF3504, which exists on diverse proteins with presumed or known nuclear functions. Overall, our work strongly expands this recently identified family of transcription factors and illustrates the value of 3D structural predictions to annotate protein domains and interpret their functions.
Read full abstract