Chromosome 21 is the smallest human chromosome and represents a model for physical mapping of the entire human genome. Three copies of this autosome cause Down syndrome, the most frequent genetic disorder associated with significant mental retardation. Five years ago, in the framework of the international human genome project, and as an integrated part of the chromosome 21 community, a consortium, of academic groups from Japan and Germany was formed to map and sequence this chromosome. Several whole-genome and chromosome-specific DNA libraries were constructed for mapping purposes. Using a combination of nested-deletion [1] and shotgun sequencing approaches, our center determined and analyzed over 17 million bases of high-quality data. In total, over 33.5 million basepairs of DNA, distributed over four contigs, were sequenced with very high accuracy [2]. The largest of these contigs is nearly 25.5Mb long. Only three small clone gaps remain, which together comprise approximately 100kb. Thus, we achieved a coverage of 99.7% of 21q. In addition, 211,116 bp from the short arm were also sequenced. Analysis of the chromosome revealed far fewer genes than previously estimated. Instead of 800-1000 expected genes, we found only 225 genes, of which 127 are known genes and 98 are predicted genes. In addition, 59 pseudogenes were identified. The completion of the 21q sequence provides a unique resource for understanding the molecular pathophysiology of Down syndrome, as well as all other monogenic and complex disorders that map to this chromosome, including Alzheimer’s disease, leukemia, autoimmune disease, epilepsy and manic-depressive psychosis. It also stands as a structural framework from which the complete molecular architecture of the chromosome can be determined. With the announcement of the first “working draft” of the human genome by the international human genome sequencing consortium this past June, there has been a great interest in the total number of genes in the human genome[3]. Some of the current estimates range from as low as 34,000 genes to as high as 140,000 genes. The low gene-density of chromosome 21 was quite unexpected. Most striking is a 7-Mb region near the region near the centromere that contains only one gene. This region is much larger than the whole genome of some species, such as Escherichia coli, yet evolutionary processes permitted the existence of such a gene-poor SNA segment. This finding leads us to propose that there are additional large gene-less regions in other human/mammalian chromosomes. By combining the gene numbers of chromosomes 21 and 22 [4] (770genes; 2-3% of human genome) and assuming that together they represent the average gene content of the human genome, we estimate that the total number of human genes may be close to 40,000. As part of the international human genome sequencing project, we are also sequencing parts of chromosomes 11 and 18. Our high-throughput sequence production line generates about 10 Mb per day. In order to handle such a high volume of output, we have developed a system for automated data assembly, annotation and release. All the data is released through our website and through the DNA Database of Japan immediately according to the policies established by the international consortium. In addition to chromosomes 11, 18 and 21, we are also interested in the structure of the entire human genome. In collaboration with the University of Tokyo, we have developed a database called HGREP [5] (Human Genome Reconstruction Project) that contains working draft and finished sequences which cover more than 85% of the human genomic sequences. Sequence entries are aligned along the chromosomes based on sequence similarities to STS markers, BAC-end and other entry sequences. Further, biological features, such as genes, gene functions, repeats and CpG islands, are fully annotated on these entries. To identify evolutionally conserved and biologically important information in the genome, we are taking the strategy of comparative genomic sequence analysis between human and other related genomes such as those of rodents and primates. We are currently sequencing several regions of the mouse, including some regions that are counterpart to human chromosome 21, and are preparing genomic resources for sequencing of the chimpanzee. Through comparative methods, we are also trying to understand telomeric regions, structures at the end of chromosomes that play an important role in several biological functions and have been associated with human diseases like an oncogenesis and a cellular aging. The structure and function of all genes and their regulatory regions can only be fully understood by looking at how they interact under different circumstances and in several different backgrounds. Determining which genes and elements are human-specific will lead to a deeper understanding of who we are and will advance medical sciences drastically.
Read full abstract