Abstract
The application and utilization of sequence data has been found very informative in the characterization and phylogenetic relationship of different crops species. This study aimed to use bioinformatics tools to characterize the matK gene in some selected legumes with special reference to pigeon pea [ cajanus cajan (L.)Millsp] matK sequence as a quarry sequence. Nucleotide and amino acid sequence of matK gene of 10 legumes were retrieved from NCBI database and analysed for homology, physiochemical properties, motifs, GC content as well as phylogenetic relationships. Results showed that the nucleotide and amino acid sequence lengths of this gene among the selected legumes differs. Its nucleotide length varied between 631-1580bp, while the amino acids sequence varied between 21 and 509 residues. P. tetragonolobus matK and C. cajan matK sequences had percentage identity of 88% while V. sativa had the lowest percentage identity of 70%. G.tomentella and P. tetragonolobus matK sequence shared the same percentage similarity of 91% with C.cajan while V. sativa had the least (78%) with C.cajan . The motif predicted were tyrosine kinase phosphorylation site, N-myristoylation site, N-glycosylation site, protein kinase phosphorylation site, casein kinase II phosphorylation site and cAMP- and –cGMP dependent protein kinase phosphorylation site. However, microbodies C-terminal targeting site was only predicted in the amino acid sequence of matK gene of P. sativum and C.cajan . Phylogenetically, two major clades were revealed with P.sativum , V.sativa , and C. arientinum matK gene sequence in clade A and matK gene sequence of P.tetragonolobus, C. cajan, G. tomentella, P.vulgaris, V.unguiculata, V. angularis and V. radiate in clade B. It showed that clade A diverged from the ancestry legume approximately 39MYA while legume sequences in clade B diverged from the ancestor about 57MYA. GC content of the nucleotide sequence of matK gene of V. sativa was highest (31.37%) with the range in the selected legume varying between 7.29%-31.37%. The secondary structure of amino acids sequence of matK gene in the selected legume revealed the alpha helix (34.14%-41.27%), extended strand (11.56%-20.99%) and random coil (39.48%-51.76%). The major domain architecture found in the amino acid sequence were single and double types. Implicitly, though maturase K gene sequences in the selected legumes differ in lengths physiochemical properties, GC content and motif. The result of this study revealed that C.cajan matK gene sequences is closely related to that of P. tetragonolobus but distant to V. unguiculata as well as P. vulgaris .
Highlights
The recent upsurge in the application and utilization of molecular/sequence data to systematic and evolutionary queries has led to significant contributions to effective classification of both plants and animals
It was observed that nucleotide sequences of maturase K (matK) genes for P. tetragonolobus, G. tomentella, C. arietinum and V. sativa were the longest while P. sativum sequence was the shortest (641bps)
This trend was observed for the amino acid sequences lengths of maturase K (MatK) gene of these legumes, which may have stemmed from the fact that MatK gene sequences of those legumes with longer lengths have been completely sequenced while have partial coding sequences (CDS)
Summary
The recent upsurge in the application and utilization of molecular/sequence data to systematic and evolutionary queries has led to significant contributions to effective classification of both plants and animals. Many chloroplast, mitochondrial and nuclear genes have been utilized for studying and understanding sequence variations and evolutionary trends at the genus level (Clark et al, 1995; Hsiao et al, 1999). Among the genes, sequences for the rbcl gene was frequently used and analysed by researchers in the bid to understanding plant systematics beyond the family level (Donoghue et al, 1992; Chase et al, 1993; Duval et al, 1993). Maturase K (matk) gene, formally known as orfk has emerged as a gene of interest with potential in plant molecular systematics and evolution because of the genes’ rapid evolution at nucleotide and corresponding amino acid levels (Johnson and Soltis, 1995; Liang and Hilu, 1996; Miller et al, 2006).
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have