Abstract

Topologically associating domains (TADs) are the structural and functional units of the genome. However, the functions of protein-coding genes existing in the same or different TADs have not been fully investigated. We compared the functional similarities of protein-coding genes existing in the same TAD and between different TADs, and also in the same gap region (the region between two consecutive TADs) and between different gap regions. We found that the protein-coding genes from the same TAD or gap region are more likely to share similar protein functions, and this trend is more obvious with TADs than the gap regions. We further created two types of gene–gene spatial interaction networks: the first type is based on Hi-C contacts, whereas the second type is based on both Hi-C contacts and the relationship of being in the same TAD. A graph auto-encoder was applied to learn the network topology, reconstruct the two types of networks, and predict the functions of the central genes/nodes based on the functions of the neighboring genes/nodes. It was found that better performance was achieved with the second type of network. Furthermore, we detected long-range spatially-interactive regions based on Hi-C contacts and calculated the functional similarities of the gene pairs from these regions.

Highlights

  • Multi-level organizations of the eukaryotic genome are identified to characterize the conformation of chromosomes, such as nucleosomes, topologically associating domains (TADs), chromosomes, and chromosome territories

  • We gathered all possible pairwise combinations of intra-TAD gene pairs and filtered out the gene pairs with protein sequence identities greater than 90% to ensure duplicate genes were removed

  • Functional similarities of gene pairs were calculated in terms of three gene ontology (GO) categories: biological process ontology (BPO), cellular component ontology (CCO), and molecular function ontology (MFO) [31]

Read more

Summary

Introduction

Multi-level organizations of the eukaryotic genome are identified to characterize the conformation of chromosomes, such as nucleosomes, topologically associating domains (TADs), chromosomes, and chromosome territories. TADs, with an average length of ~1 Mbp, are identified as the high-level chromatin structure so that the genomic loci within a TAD frequently interact with each other. TADs are highly conserved across cell types and mammalian genomes, which indicates that the chromatin structures of TADs are stable in evolution [7,16,17]. The insulator binding protein CTCF, housekeeping genes, and transfer RNAs are enriched in TAD boundaries and are considered the key elements to determine the 3D structure of the genomes [5]. Ectopic chromosomal contacts and longrange transcriptional misregulation can occur when TAD boundaries are destroyed, which shows that TADs participate in the coordination of regulating genes [18]. TADs can coordinate the transcriptions of spatially proximal genes and the zones of enhancer activities via chromatin loops [18–22] and influence some interrelated functions of transcriptional regulation [23]

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.