Abstract
We describe a nitrogenase gene sequence database that facilitates analysis of the evolution and ecology of nitrogen-fixing organisms. The database contains 32 954 aligned nitrogenase nifH sequences linked to phylogenetic trees and associated sequence metadata. The database includes 185 linked multigene entries including full-length nifH, nifD, nifK and 16S ribosomal RNA (rRNA) gene sequences. Evolutionary analyses enabled by the multigene entries support an ancient horizontal transfer of nitrogenase genes between Archaea and Bacteria and provide evidence that nifH has a different history of horizontal gene transfer from the nifDK enzyme core. Further analyses show that lineages in nitrogenase cluster I and cluster III have different rates of substitution within nifD, suggesting that nifD is under different selection pressure in these two lineages. Finally, we find that that the genetic divergence of nifH and 16S rRNA genes does not correlate well at sequence dissimilarity values used commonly to define microbial species, as stains having <3% sequence dissimilarity in their 16S rRNA genes can have up to 23% dissimilarity in nifH. The nifH database has a number of uses including phylogenetic and evolutionary analyses, the design and assessment of primers/probes and the evaluation of nitrogenase sequence diversity.Database URL: http://www.css.cornell.edu/faculty/buckley/nifh.htm
Highlights
Biological nitrogen fixation contributes around half of annual nitrogen inputs into the biosphere [1] and is an important source of nitrogen in natural ecosystems [2] and in agricultural systems [3, 4]
The database we describe includes all nifH sequences present in the Genbank nucleotide database as of 16 May 2012 and includes associated metadata in a searchable framework
The nifH sequences are organized into phylogenetic trees that can be navigated readily in analyses of nitrogenase diversity and evolution and in the evaluation of nifH polymerase chain reaction (PCR) primers
Summary
Biological nitrogen fixation contributes around half of annual nitrogen inputs into the biosphere [1] and is an important source of nitrogen in natural ecosystems [2] and in agricultural systems [3, 4]. Phylogenetic trees can be used to navigate the sequences and to explore phylogenetic patterns found in associated metadata Preliminary versions of this nifH database, one containing 16 989 and another 23 843 sequences, were used to evaluate the diversity of nifH genes in different environments [15] and to evaluate PCR primers used in environmental surveys of nifH diversity [25]. Multigene database entries for nifH, nifD, nifK and 16S rRNA genes were generated using data from sequenced genomes. There are 7030 sequences (21.35% of the total database) in nifH cluster III and subcluster IA (Supplementary Figure S6), the clusters that contain primarily obligate anaerobes. The database contains 185 multigene entries composed of nifH, nifD, nifK and 16S rRNA genes. We observe that genomes that have >97% similarity in 16S rRNA genes can have up to 23% dissimilarity in their nifH sequences
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.