Abstract

BackgroundGene family identification from ESTs can be a valuable resource for analysis of genome evolution but presents unique challenges in organisms for which the entire genome is not yet sequenced. We have developed a novel gene family identification method based on negative selection patterns (NSP) between family members to screen EST-generated contigs. This strategy was tested on five known gene families in Arabidopsis to see if individual paralogs could be identified with accuracy from EST data alone when compared to the actual gene sequences in this fully sequenced genome.ResultsThe NSP method uniquely identified family members in all the gene families tested. Two members of the FtsH gene family, three members each of the PAL, RF1, and ribosomal L6 gene families, and four members of the CAD gene family were correctly identified. Additionally all ESTs from the representative contigs when checked against MapViewer data successfully identify the gene locus predicted.ConclusionWe demonstrate the effectiveness of the NSP strategy in identifying specific gene family members in Arabidopsis using only EST data and we describe how this strategy can be used to identify many gene families in agronomically important crop species where they are as yet undiscovered.

Highlights

  • Gene family identification from Expressed sequence tags (ESTs) can be a valuable resource for analysis of genome evolution but presents unique challenges in organisms for which the entire genome is not yet sequenced

  • Phenylalanine ammonia-lyase gene family The tblastn search of using AtPAL1 protein as query resulted in ESTs and contigs reported previously [28]

  • We report the refinement of using dS/dN ratio rather than a tally of 1st, 2nd, and 3rd position differences as well as the MapViewer results that validate the accuracy of gene family member identification

Read more

Summary

Introduction

Gene family identification from ESTs can be a valuable resource for analysis of genome evolution but presents unique challenges in organisms for which the entire genome is not yet sequenced. We have developed a novel gene family identification method based on negative selection patterns (NSP) between family members to screen EST-generated contigs. This strategy was tested on five known gene families in Arabidopsis to see if individual paralogs could be identified with accuracy from EST data alone when compared to the actual gene sequences in this fully sequenced genome. Others have spawned computational methods to analyze and predict the evolution of gene families in a phylogenetic context [6] or determine clinically relevant sites in a protein sequence where amino acid replacements are likely to have a significant effect on phenotype, including those that may cause genetic diseases [7]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call