The impact of telomere-to-telomere genome assembly in the plant pan-genomics era
Advances in sequencing technologies have enabled the determination of genome sequences of multiple lines within a single species. Comparative analysis of multiple genome sequences reveals all genes present within a species, providing insight into the genetic mechanisms that lead to the establishment of species. Highly accurate pan-genome analysis requires telomere-to-telomere gapless genome assembly, providing an ultimate genome sequence that covers all chromosomal regions without any undetermined nucleotide sequences. This review describes the genome sequencing technologies and sophisticated bioinformatics required for telomere-to-telomere gapless genome assembly, as well as a genetic mapping that can evaluate the accuracy of telomere-to-telomere genome assembly. Pan-genome analyses may contribute to the understanding of genetic mechanisms not only within a single species but also across species.
- Research Article
52
- 10.1101/gr.1759004
- Jan 12, 2004
- Genome research
Comparative analysis of genomic sequences is becoming a standard technique for studying gene regulation. However, only a limited number of tools are currently available for the analysis of multiple genomic sequences. An extensive data set for the testing and training of such tools is provided by the SCL gene locus. Here we have expanded the data set to eight vertebrate species by sequencing the dog SCL locus and by annotating the dog and rat SCL loci. To provide a resource for the bioinformatics community, all SCL sequences and functional annotations, comprising a collation of the extensive experimental evidence pertaining to SCL regulation, have been made available via a Web server. A Web interface to new tools specifically designed for the display and analysis of multiple sequence alignments was also implemented. The unique SCL data set and new sequence comparison tools allowed us to perform a rigorous examination of the true benefits of multiple sequence comparisons. We demonstrate that multiple sequence alignments are, overall, superior to pairwise alignments for identification of mammalian regulatory regions. In the search for individual transcription factor binding sites, multiple alignments markedly increase the signal-to-noise ratio compared to pairwise alignments.
- Research Article
2
- 10.1094/mpmi-07-21-0165-a
- Jul 14, 2022
- Molecular plant-microbe interactions : MPMI
Genome and Transcriptome Sequence Resources and Effector Repertoire of Pythium myriotylum Drechsler.
- Research Article
157
- 10.1016/j.tplants.2019.12.011
- Jan 21, 2020
- Trends in Plant Science
Cotton (Gossypium spp.) is the most important natural fiber crop worldwide. The diversity of Gossypium species also provides an ideal model for investigating evolution and domestication of polyploids. However, the huge and complex cotton genome hinders genomic research. Technical advances in high-throughput sequencing and bioinformatics analysis have now largely overcome these obstacles, bringing about a new era of cotton genomics. Here, we review recent progress in Gossypium genomics based on whole genome sequencing, resequencing, and comparative genomics, which have provided insights about the genomic basis of fiber biogenesis and the landscape of cotton functional genomics. We address current challenges and present multidisciplinary genomics-enabled breeding strategies covering the breadth of high fiber yield, quality, and environmental resilience for future cotton breeding programs.
- Research Article
3
- 10.5897/jcbbr.9000001
- Nov 30, 2011
- Journal of Computational Biology and Bioinformatics Research
Gene sequence analysis is a key-step for genomic research, which help to understand the genome of species once it has been sequenced. It includes pair-wise, comparative or multiple sequence analysis. The Genome On-Line Database (GOLD) provides information about the number of completed, meta, incomplete and targeted genome projects. The statistics of GOLD show 2942 of completed, 7687 of incomplete, 340 of meta and 440 of targeted genome projects. The Support Vector Machine (SVM) is a widely used technique that analyzes the gene expression or micro array data. In the present study, we performed inter and intra species comparative nucleic acid as well as protein sequence analysis of Leucine Rich Repeat (LRR) and Ice-recrystallization Inhibition (IRI) domain containing plant antifreeze proteins (AFPs), which provide extensive understanding of their sequential characteristics and help in their classification and in production of transgenic constructs to improve the agricultural yields. Here, classification based on their sequential characteristics was made accordingly, the AFPs from Daucus carota bearing only LRR domains were placed in Class I group while AFPs with both LRR and IRI domains fromTriticum aestivum, Deschampsia antarctica, Lolium perenne and Hordeum vulgare were placed in Class II group. In Class II groups, the entries with less than ten occurrences of IRI were placed in a subgroup A, while the other with more than ten incidences of IRI was placed in a subgroup B. Later, the entries in A and B which has single LRR patterns were placed separately under the Group A1 and B1, whereas those with more than one occurrence were placed in the groups A2 and B2 respectively. Again, the entries in B1 were reclassified based on the conservation of LRR into C1 and C2 groups respectively. LRR regions were found to be enriched with alpha and beta sheet whereas IRI regions contain coil and sheets. The reported classification scheme and proposed methodology facilitate the identification, annotation and construction of synthetic plant AFPs in near future. Ongoing efforts are directed towards the development of comprehensive database integrated with the prediction server for identification of new class of plant AFPs and their homology in an extensive manner. Key words: Antifreeze protein, leucine rich repeat, over-wintering plants, comparative sequence analysis, ice-recrystallization inhibition protein.
- Research Article
14
- 10.1111/nph.13826
- Dec 22, 2015
- New Phytologist
Opportunities for unlocking the potential of genomics for African trees.
- Research Article
76
- 10.1111/j.1365-313x.2007.03112.x
- May 23, 2007
- The Plant Journal
As part of a larger project to sequence the Populus genome and generate genomic resources for this emerging model tree, we constructed a physical map of the Populus genome, representing one of the few such maps of an undomesticated, highly heterozygous plant species. The physical map, consisting of 2802 contigs, was constructed from fingerprinted bacterial artificial chromosome (BAC) clones. The map represents approximately 9.4-fold coverage of the Populus genome, which has been estimated from the genome sequence assembly to be 485 +/- 10 Mb in size. BAC ends were sequenced to assist long-range assembly of whole-genome shotgun sequence scaffolds and to anchor the physical map to the genome sequence. Simple sequence repeat-based markers were derived from the end sequences and used to initiate integration of the BAC and genetic maps. A total of 2411 physical map contigs, representing 97% of all clones assigned to contigs, were aligned to the sequence assembly (JGI Populus trichocarpa, version 1.0). These alignments represent a total coverage of 384 Mb (79%) of the entire poplar sequence assembly and 295 Mb (96%) of linkage group sequence assemblies. A striking result of the physical map contig alignments to the sequence assembly was the co-localization of multiple contigs across numerous regions of the 19 linkage groups. Targeted sequencing of BAC clones and genetic analysis in a small number of representative regions showed that these co-aligning contigs represent distinct haplotypes in the heterozygous individual sequenced, and revealed the nature of these haplotype sequence differences.
- Research Article
39
- 10.1186/1471-2164-13-653
- Nov 21, 2012
- BMC Genomics
BackgroundThe genus Nelumbo Adans. comprises two living species, N. nucifera Gaertan. (Asian lotus) and N. lutea Pers. (American lotus). A genetic linkage map is an essential resource for plant genetic studies and crop improvement but has not been generated for Nelumbo. We aimed to develop genomic simple sequence repeat (SSR) markers from the genome sequence and construct two genetic maps for Nelumbo to assist genome assembly and integration of a genetic map with the genome sequence.ResultsA total of 86,089 SSR motifs were identified from the genome sequences. Di- and tri-nucleotide repeat motifs were the most abundant, and accounted for 60.73% and 31.66% of all SSRs, respectively. AG/GA repeats constituted 51.17% of dinucleotide repeat motifs, followed by AT/TA (44.29%). Of 500 SSR primers tested, 386 (77.20%) produced scorable alleles with an average of 2.59 per primer, and 185 (37.00%) showed polymorphism among two parental genotypes, N. nucifera ‘Chinese Antique’ and N. lutea ‘AL1’, and six progenies of their F1 population. The normally segregating markers, which comprised 268 newly developed SSRs, 37 previously published SSRs and 53 sequence-related amplified polymorphism markers, were used for genetic map construction. The map for Asian lotus was 365.67 cM with 47 markers distributed in seven linkage groups. The map for American lotus was 524.51 cM, and contained 177 markers distributed in 11 genetic linkage groups. The number of markers per linkage group ranged from three to 34 with an average genetic distance of 3.97 cM between adjacent markers. Moreover, 171 SSR markers contained in linkage groups were anchored to 97 genomic DNA sequence contigs of ‘Chinese Antique’. The 97 contigs were merged into 60 scaffolds.ConclusionGenetic mapping of SSR markers derived from sequenced contigs in Nelumbo enabled the associated contigs to be anchored in the linkage map and facilitated assembly of the genome sequences of ‘Chinese Antique’. The present study reports the first construction of genetic linkage maps for Nelumbo, which can serve as reference linkage maps to accelerate characterization germplasm, genetic mapping for traits of economic interest, and molecular breeding with marker-assisted selection.
- Research Article
5
- 10.1094/mpmi-01-22-0021-a
- Jul 13, 2022
- Molecular Plant-Microbe Interactions®
Genome Sequence Resource of Bacillus velezensis Strain HC-8, a Native Bacterial Endophyte with Biocontrol Potential Against the Honeysuckle Powdery Mildew Causative Pathogen Erysiphe lonicerae var. lonicerae.
- Research Article
6
- 10.1094/phyto-11-21-0490-a
- Apr 29, 2022
- Phytopathology®
An Improved Genome Sequence Resource of Bipolaris maydis, Causal Agent of Southern Corn Leaf Blight.
- Research Article
14
- 10.3390/ijms25137147
- Jun 28, 2024
- International journal of molecular sciences
Beef is a major global source of protein, playing an essential role in the human diet. The worldwide production and consumption of beef continue to rise, reflecting a significant trend. However, despite the critical importance of beef cattle resources in agriculture, the diversity of cattle breeds faces severe challenges, with many breeds at risk of extinction. The initiation of the Beef Cattle Genome Project is crucial. By constructing a high-precision functional annotation map of their genome, it becomes possible to analyze the genetic mechanisms underlying important traits in beef cattle, laying a solid foundation for breeding more efficient and productive cattle breeds. This review details advances in genome sequencing and assembly technologies, iterative upgrades of the beef cattle reference genome, and its application in pan-genome research. Additionally, it summarizes relevant studies on the discovery of functional genes associated with key traits in beef cattle, such as growth, meat quality, reproduction, polled traits, disease resistance, and environmental adaptability. Finally, the review explores the potential of telomere-to-telomere (T2T) genome assembly, structural variations (SVs), and multi-omics techniques in future beef cattle genetic breeding. These advancements collectively offer promising avenues for enhancing beef cattle breeding and improving genetic traits.
- Research Article
231
- 10.1093/nar/gkg579
- Jul 1, 2003
- Nucleic Acids Research
Analysis of multiple sequence alignments can generate important, testable hypotheses about the phylogenetic history and cellular function of genomic sequences. We describe the MultiPipMaker server, which aligns multiple, long genomic DNA sequences quickly and with good sensitivity (available at http://bio.cse.psu.edu/ since May 2001). Alignments are computed between a contiguous reference sequence and one or more secondary sequences, which can be finished or draft sequence. The outputs include a stacked set of percent identity plots, called a MultiPip, comparing the reference sequence with subsequent sequences, and a nucleotide-level multiple alignment. New tools are provided to search MultiPipMaker output for conserved matches to a user-specified pattern and for conserved matches to position weight matrices that describe transcription factor binding sites (singly and in clusters). We illustrate the use of MultiPipMaker to identify candidate regulatory regions in WNT2 and then demonstrate by transfection assays that they are functional. Analysis of the alignments also confirms the phylogenetic inference that horses are more closely related to cats than to cows.
- Book Chapter
- 10.1007/978-3-662-53389-5_8
- Jan 1, 2016
The computational process of reconstructing a genome by assembling large amounts of raw sequencing data into long DNA fragments poses great challenges. This chapter illustrates current genome sequencing technologies and assembly algorithms by example of the tomato genome sequencing project. Over the last decade, “Next Generation Sequencing” technologies have placed great emphasis on efficient library preparation, high throughput and long read length. These developments have pushed the evolution of genome assembly approaches from greedy overlap-layout-consensus approaches that were used to assemble Sanger sequences, to de Bruijn graph and string graph approaches that are currently in use to assemble these new types of sequencing data produced in large volume. Nonetheless, many species still lack a high-quality, gold-standard genome sequence as genome assembly is still far from a solved problem. Several approaches have been developed to estimate the quality of assembled genome sequences and to perform so-called genome finishing, a complicated and costly procedure to complete the unresolved regions of the genome. We expect that within this decade sequencing technologies will undergo another dramatic improvement, resulting in “Third Generation Sequencing” technologies with which chromosomes and genomes can be sequenced in their entirety with high accuracy. Plant breeding will benefit enormously from this development, providing breeders with the tools, data and understanding to design new traits and varieties from natural and induced genetic variation in an entirely rationalized and economical manner, and much beyond our current capabilities. The tomato genome described here was sequenced within an international collaboration and its completion spanned almost a decade. The novel sequencing technologies that were invented and commercialized during the course of this effort resulted in the generation of multiple types of sequence datasets. This in turn required development and application of state-of-the-art bioinformatics approaches to process the vast and varied datasets in order to produce a near-complete and high quality genome assembly.
- Research Article
5
- 10.3835/plantgenome2009.02.0004let
- Mar 1, 2009
- The Plant Genome
A Genome May Reduce Your Carbon Footprint
- Research Article
33
- 10.1128/aem.01866-07
- Feb 8, 2008
- Applied and Environmental Microbiology
We previously published a genetic map of Gibberella zeae (Fusarium graminearum sensu lato) based on a cross between Kansas strain Z-3639 (lineage 7) and Japanese strain R-5470 (lineage 6). In this study, that genetic map was aligned with the third assembly of the genomic sequence of G. zeae strain PH-1 (lineage 7) using seven structural genes and 108 sequenced amplified fragment length polymorphism markers. Several linkage groups were combined based on the alignments, the nine original linkage groups were reduced to six groups, and the total size of the genetic map was reduced from 1,286 to 1,140 centimorgans. Nine supercontigs, comprising 99.2% of the genomic sequence assembly, were anchored to the genetic map. Eight markers (four markers from each parent) were not found in the genome assembly, and four of these markers were closely linked, suggesting that >150 kb of DNA sequence is missing from the PH-1 genome assembly. The alignments of the linkage groups and supercontigs yielded four independent sets, which is consistent with the four chromosomes reported for this fungus. Two proposed heterozygous inversions were confirmed by the alignments; otherwise, the colinearity of the genetic and physical maps was high. Two of four regions with segregation distortion were explained by the two selectable markers employed in making the cross. The average recombination rates for each chromosome were similar to those previously reported for G. zeae. Despite an inferred history of genetic isolation of lineage 6 and lineage 7, the chromosomes of these lineages remain homologous and are capable of recombination along their entire lengths, even within the inversions. This genetic map can now be used in conjunction with the physical sequence to study phenotypes (e.g., fertility and fitness) and genetic features (e.g., centromeres and recombination frequency) that do not have a known molecular signature in the genome.
- Research Article
11
- 10.1111/pbi.14075
- Jun 15, 2023
- Plant Biotechnology Journal
The ricebean genome provides insight into Vigna genome evolution and facilitates genetic enhancement.