Databases, Biobanks and International Consortia: Major Resources for Human Population Genomics and Biological Anthropology.
The Human Genome Project, together with the subsequent advent of diverse repositories for storing, sharing, and analyzing biological data, represented a revolution in the way genetic research is conducted. The overwhelming landscape of omics data resources available today provides an exceptional opportunity to drive innovations in human genetics. Here, we report an updated compilation of 117 open-access and active resources containing DNA, RNA, protein and phenotypic data, viewed through the lens of a population geneticist. A deep inspection of these resources from both spatial and temporal perspectives allows us to identify trends in the use of large-scale biological data, as well as the current and future challenges in the field-including the implementation of new technologies, the call for greater diversity in the sampled populations, and the imperative role of bioethics. Addressing these issues requires the integration of biology and medicine, along with the crucial insights gained from population-based research.
- Research Article
- 10.30047/jgmb.200009.0009
- Sep 1, 2000
- Journal of Genetics and Molecular Biology
Chromosome 21 is the smallest human chromosome and represents a model for physical mapping of the entire human genome. Three copies of this autosome cause Down syndrome, the most frequent genetic disorder associated with significant mental retardation. Five years ago, in the framework of the international human genome project, and as an integrated part of the chromosome 21 community, a consortium, of academic groups from Japan and Germany was formed to map and sequence this chromosome. Several whole-genome and chromosome-specific DNA libraries were constructed for mapping purposes. Using a combination of nested-deletion [1] and shotgun sequencing approaches, our center determined and analyzed over 17 million bases of high-quality data. In total, over 33.5 million basepairs of DNA, distributed over four contigs, were sequenced with very high accuracy [2]. The largest of these contigs is nearly 25.5Mb long. Only three small clone gaps remain, which together comprise approximately 100kb. Thus, we achieved a coverage of 99.7% of 21q. In addition, 211,116 bp from the short arm were also sequenced. Analysis of the chromosome revealed far fewer genes than previously estimated. Instead of 800-1000 expected genes, we found only 225 genes, of which 127 are known genes and 98 are predicted genes. In addition, 59 pseudogenes were identified. The completion of the 21q sequence provides a unique resource for understanding the molecular pathophysiology of Down syndrome, as well as all other monogenic and complex disorders that map to this chromosome, including Alzheimer’s disease, leukemia, autoimmune disease, epilepsy and manic-depressive psychosis. It also stands as a structural framework from which the complete molecular architecture of the chromosome can be determined. With the announcement of the first “working draft” of the human genome by the international human genome sequencing consortium this past June, there has been a great interest in the total number of genes in the human genome[3]. Some of the current estimates range from as low as 34,000 genes to as high as 140,000 genes. The low gene-density of chromosome 21 was quite unexpected. Most striking is a 7-Mb region near the region near the centromere that contains only one gene. This region is much larger than the whole genome of some species, such as Escherichia coli, yet evolutionary processes permitted the existence of such a gene-poor SNA segment. This finding leads us to propose that there are additional large gene-less regions in other human/mammalian chromosomes. By combining the gene numbers of chromosomes 21 and 22 [4] (770genes; 2-3% of human genome) and assuming that together they represent the average gene content of the human genome, we estimate that the total number of human genes may be close to 40,000. As part of the international human genome sequencing project, we are also sequencing parts of chromosomes 11 and 18. Our high-throughput sequence production line generates about 10 Mb per day. In order to handle such a high volume of output, we have developed a system for automated data assembly, annotation and release. All the data is released through our website and through the DNA Database of Japan immediately according to the policies established by the international consortium. In addition to chromosomes 11, 18 and 21, we are also interested in the structure of the entire human genome. In collaboration with the University of Tokyo, we have developed a database called HGREP [5] (Human Genome Reconstruction Project) that contains working draft and finished sequences which cover more than 85% of the human genomic sequences. Sequence entries are aligned along the chromosomes based on sequence similarities to STS markers, BAC-end and other entry sequences. Further, biological features, such as genes, gene functions, repeats and CpG islands, are fully annotated on these entries. To identify evolutionally conserved and biologically important information in the genome, we are taking the strategy of comparative genomic sequence analysis between human and other related genomes such as those of rodents and primates. We are currently sequencing several regions of the mouse, including some regions that are counterpart to human chromosome 21, and are preparing genomic resources for sequencing of the chimpanzee. Through comparative methods, we are also trying to understand telomeric regions, structures at the end of chromosomes that play an important role in several biological functions and have been associated with human diseases like an oncogenesis and a cellular aging. The structure and function of all genes and their regulatory regions can only be fully understood by looking at how they interact under different circumstances and in several different backgrounds. Determining which genes and elements are human-specific will lead to a deeper understanding of who we are and will advance medical sciences drastically.
- Research Article
22
- 10.1161/circgenetics.108.843946
- Apr 1, 2009
- Circulation: Cardiovascular Genetics
The sequencing of the human genome, the identification of common single-nucleotide polymorphisms (SNPs) and haplotype blocks, and advances in microarray technology have enabled the study of complex diseases at a level of detail not previously imaginable. These have aided in the design and analyses of association and linkage studies of many complex diseases including cardiovascular disease. Recent technological advances have enabled the undertaking of large-scale genome-wide association studies (GWAS) that can assay hundreds of thousands of polymorphic sites on hundreds to thousands of individuals to find genomic regions associated with disease. Although results from these experiments enable the identification of smaller regions of association compared with previous studies, as with all linkage and association studies, there is the need for the further investigation of regions of interest for the causal genes or variants. The purpose of this review is to present a detailed demonstration as to how publicly available resources can be used to easily guide more detailed research into genomic regions of interest identified in linkage and association study data. Large-scale projects, such as the Human Genome Sequencing project,1,2 have generated large volumes and varieties of annotated genomic data necessitating the development of Internet-based tools to organize and make practically available these public data. One important tool in human disease research is the web-based graphical genome browsers that use the human genome sequence as the framework on which to organize genomic annotations, providing various ways for researchers to view and extract important information. Currently, there are 3 human genome browsers that have been developed for public use: (1) the National Center for Biotechnology Information (NCBI) Map Viewer3; (2) the University of California Santa Cruz (UCSC) Genome Browser4; and (3) the European Bioinformatics Institute’s Ensembl system.5 Although these genome browsers share common features and …
- Front Matter
7
- 10.1016/s1471-4914(01)02170-0
- Oct 25, 2001
- Trends in Molecular Medicine
Genetics, genomics and beyond
- Research Article
3
- 10.1097/00001648-200105000-00019
- May 1, 2001
- Epidemiology (Cambridge, Mass.)
Opportunities for population-based studies of complex genetic disorders after the human genome project.
- Research Article
56
- 10.1111/rda.12201
- Aug 21, 2013
- Reproduction in Domestic Animals
Technical advances and development in the market for genomic tools have facilitated access to whole-genome data across species. Building-up on the acquired knowledge of the genome sequences, large-scale genotyping has been optimized for broad use, so genotype information can be routinely used to predict genetic merit. Genomic selection (GS) refers to the use of aggregates of estimated marker effects as predictors which allow improved individual differentiation at young age. Realizable benefits of GS are influenced by several factors and vary in quantity and quality between species. General characteristics and challenges of GS in implementation and routine application are described, followed by an overview over the current status of its use, prospects and challenges in important animal species. Genetic gain for a particular trait can be enhanced by shortening of the generation interval, increased selection accuracy and increased selection intensity, with species- and breed-specific relevance of the determinants. Reliable predictions based on genetic marker effects require assembly of a reference for linking of phenotype and genotype data to allow estimation and regular re-estimation. Experiences from dairy breeding have shown that international collaboration can set the course for fast and successful implementation of innovative selection tools, so genomics may significantly impact the structures of future breeding and breeding programmes. Traits of great and increasing importance, which were difficult to improve in the conventional systems, could be emphasized, if continuous availability of high-quality phenotype data can be assured. Equally elaborate strategies for genotyping and phenotyping will allow tailored approaches to balance efficient animal production, sustainability, animal health and welfare in future.
- Research Article
3
- 10.1542/pir.20-9-314
- Sep 1, 1999
- Pediatrics in review
1. Edward R.B. McCabe, MD, PhD* 1. 2. *Physician-in-Chief, Mattel Children’s Hospital at UCLA; Professor and Executive Chair, Department of Pediatrics, UCLA School of Medicine, Los Angeles, CA. Birth defects are the leading cause of infant mortality in the United States, representing more than 20% of all infant deaths. This infant mortality rate from birth defects exceeds that from sudden infant death syndrome, low-birthweight/short gestation, respiratory distress syndrome, and maternal complications. In addition, birth defects and genetic diseases represent major sources of morbidity for those who survive. As our ability increases to care effectively for those who have infectious diseases and other acute illnesses, individuals who have chronic illnesses due to genetic etiologies represent an increasing proportion of patients seen in the general pediatrician”s office. The Human Genome Project was initiated on October 1, 1990, and has a projected funding period of 15 years. The goal is to sequence the entire human genome, representing three billion base pairs that contain the coding sequences for approximately 75,000 genes. During the latter half of this century, investigations into the genetics of disease gathered increasing momentum. In addition to fundamental investigations into human genetics, technologic tools were developed that permitted large-scale genomic sequencing. These tools included the polymerase chain reaction (PCR), which permits amplification of hundreds of thousands or even millions of copies of DNA and requires only limited sequence data for its success; automated DNA sequencing, which allows increased sequence processing and decreased cost compared with manual methods; and improved information systems, which permit sophisticated analysis and assembly of the three billion base pairs of DNA in the human genome. Thus, the Human Genome Project represents the current chapter in our understanding, but it is neither the first nor the final chapter in this story. Once we know the sequences of all of the human genes, we must learn their functional roles in human development and disease pathogenesis. The Human Genome Project has been referred to as the “moon shot …
- Research Article
115
- 10.1186/1479-7364-5-6-577
- Jan 1, 2011
- Human Genomics
Substantial progress has been made in human genetics and genomics research over the past ten years since the publication of the draft sequence of the human genome in 2001. Findings emanating directly from the Human Genome Project, together with those from follow-on studies, have had an enormous impact on our understanding of the architecture and function of the human genome. Major developments have been made in cataloguing genetic variation, the International HapMap Project, and with respect to advances in genotyping technologies. These developments are vital for the emergence of genome-wide association studies in the investigation of complex diseases and traits. In parallel, the advent of high-throughput sequencing technologies has ushered in the 'personal genome sequencing' era for both normal and cancer genomes, and made possible large-scale genome sequencing studies such as the 1000 Genomes Project and the International Cancer Genome Consortium. The high-throughput sequencing and sequence-capture technologies are also providing new opportunities to study Mendelian disorders through exome sequencing and whole-genome sequencing. This paper reviews these major developments in human genetics and genomics over the past decade.
- News Article
3
- 10.1016/s1471-4922(01)01960-2
- May 1, 2001
- Trends in parasitology
The human genome: what's in it for parasitologists?
- Discussion
27
- 10.1016/s1360-1385(01)02038-6
- Aug 1, 2001
- Trends in Plant Science
Do plants have more genes than humans? Yes, when it comes to ABC proteins
- Research Article
- 10.1089/genbio.2023.29093.edg
- Apr 1, 2023
- GEN Biotechnology
Green Day: An Interview with NHGRI Director Eric Green
- Biography
1
- 10.1086/302354
- Apr 1, 1999
- The American Journal of Human Genetics
Phyllis J. McAlpine, Ph.D., 1941–98: In Memoriam
- Research Article
- 10.1086/500276
- Sep 1, 2006
- The American Journal of Human Genetics
Introductory Speech for Francis S. Collins
- Book Chapter
- 10.1016/b978-0-12-374419-7.00006-8
- Jan 1, 2009
- Molecular Pathology
Chapter 6 - The Human Genome: Implications for the Understanding of Human Disease
- Research Article
31
- 10.1007/s10142-023-00979-4
- Jan 31, 2023
- Functional & Integrative Genomics
Improvements in sequencing technology coupled with dramatic declines in the cost of genome sequencing have led to a proportional growth in the size and number of genetic datasets since the release of the human genetic sequence by The Human Genome Project (HGP) international consortium. The HGP was undeniably a significant scientific success, a turning point in human genetics and the beginning of human genomics. This burst of genetic information has led to a greater understanding of disease pathology and the potential of employing this data to deliver more precise patient care. Hence, the recognition of high-penetrance disease-causing mutations which encode drivers of disease has made the management of most diseases more specific. Nonetheless, while genetic scores are becoming more extensively used, their application in the real world is expected to be limited due to the lack of diversity in the data used to construct them. Underrepresented populations, such as racial and ethnic minorities, low-income individuals, and those living in rural areas, often experience greater health disparities and worse health outcomes compared to the general population. These disparities are often the result of systemic barriers, such as poverty, discrimination, and limited access to healthcare. Addressing health inequity in underrepresented populations requires addressing the underlying social determinants of health and implementing policies and programs which promoted health equity and reduce disparities. This can include expanding access to affordable healthcare, addressing poverty and unemployment, and promoting policies that combat discrimination and racism.
- Research Article
- 10.1002/0471142905.hgfores15
- Nov 1, 1997
- Current Protocols in Human Genetics
Human genetics has traveled a remarkable course in this century. Starting with Garrod's recognition that inborn errors could be understood on the basis of Mendel's laws, progress in the field was long punctuated by remarkable flashes of insight but inhibited by the paucity of robust laboratory-based strategies for experimentation. That obstacle collapsed dramatically with the development of methods of recombinant DNA in the 1970s. Now human molecular biology became almost as accessible as that of other organisms, and Pope's admonition that “the proper study of man is man” became infinitely more attainable. Grounded in this powerful new array of experimental approaches, it is fair to say that human genetics is now emerging as the central science of medicine, focused as it is on the study of health and disease at the most basic level of the DNA blueprint. In the mid-1980s the possibility of initiating a coordinated effort to map and sequence the entire human genome was raised. After considerable and sometimes heated discussion in the scientific community, the Human Genome Project was born. Now, only 3 years into a planned 15-year effort, the Genome Project has already made major strides in producing a highly valuable genetic map of man, organizing the physical mapping of whole chromosomes, stimulating major advances in characterizing the genomes of several model organisms, and catalyzing the discovery of a large number of human disease genes by positional cloning. With its emphasis on large-scale efforts and the consequent need for efficiency and high throughput, the Genome Project has also had a noticeable effect on bench research in human genetics—with automation, robotics, and sophisticated computer science now invading the research laboratory where once hand-held pipets with handwritten lab books reigned supreme. Thus, the experimental basis of human genetics is in an era of rapid and powerful expansion, and the demand for well-tested and clearly described research protocols is at an all-time high. Many laboratories expend considerable effort developing their own in-house set of such methods; however, the vagaries of trying to test and document such protocols effectively in a small lab often produces suboptimal results, leading many bench researchers to agree with Winston Churchill that “success is nothing more than going from failure to failure with undiminished enthusiasm.” Far better, then, to organize an international effort to assemble the best of such protocols, as exemplified by this volume and its sister publications, Current Protocols in Molecular Biology and Current Protocols in Immunology. The continuously updated nature of the protocol collection (now available on CD-ROM, which will appeal to many high-tech genome laboratories), its thorough coverage of the field, and its responsiveness to user input should make this an invaluable asset to human genetics researchers well into the next century.