Approximately 90% of Arabidopsis' unique gene content is found in syntenic blocks that were formed during the most recent whole-genome duplication. Within these blocks, 28.6% of the genes have a retained pair; the remaining genes have been lost from one of the homeologs. We create a minimized genome by condensing local duplications to one gene, removing transposons, and including only genes within blocks defined by retained pairs. We use a moving average of retained and non-retained genes to find clusters of retention and then identify the types of genes that appear in clusters at frequencies above expectations. Significant clusters of retention exist for almost all chromosomal segments. Detailed alignments show that, for 85% of the genome, one homeolog was preferentially (1.6x) targeted for fractionation. This homeolog fractionation bias suggests an epigenetic mechanism. We find that islands of retention contain "connected genes," those genes predicted-by the gene balance hypothesis-to be resistant to removal because the products they encode interact with other products in a dose-sensitive manner, creating a web of dependency. Gene families that are overrepresented in clusters include those encoding components of the proteasome/protein modification complexes, signal transduction machinery, ribosomes, and transcription factor complexes. Gene pair fractionation following polyploidy or segmental duplication leaves a genome enriched for "connected" genes. These clusters of duplicate genes may help explain the evolutionary origin of coregulated chromosomal regions and new functional modules.
Read full abstract