BackgroundAlkanes are important components of fossil energy, such as crude oil. The alkane monooxygenase encoded by alkB gene performs the initial step of alkane degradation under aerobic conditions. The alkB gene is well studied due to its ubiquity as well as the availability of experimentally functional evidence. The alkBFGHJKL and alkST clusters are special kind of alkB-type alkane hydroxylase system, which encode all proteins necessary for converting alkanes into corresponding fatty acids.MethodsTo explore whether the alkBFGHJKL and alkST clusters were widely distributed, we performed a large-scale analysis of isolate and metagenome assembled genome data (>390,000 genomes) to identify these clusters, together with distributions of corresponding taxonomy and niches. The set of alk-genes (including but not limited to alkBGHJ) located near each other on a DNA sequence was defined as an alk-gene cluster in this study. The alkB genes with alkGHJ located nearby on a DNA sequence were picked up for the investigation of putative alk-clusters.ResultsA total of 120 alk-gene clusters were found in 117 genomes. All the 117 genomes are from strains located only in α- and γ-proteobacteria. The alkB genes located in alk-gene sets were clustered into a deeply branched mono-clade. Further analysis showed similarity organization types of alk-genes were observed within closely related species. Although a large number of IS elements were observed nearby, they did not lead to the wide spread of the alk-gene cluster. The uneven distribution of these elements indicated that there might be other factors affecting the transmission of alk-gene clusters.ConclusionsWe conducted systematic bioinformatics research on alk-genes located near each other on a DNA sequence. This benchmark dataset of alk-genes can provide base line for exploring its evolutional and ecological importance in future studies.
Read full abstract