Abstract

BackgroundNon-sequence gene data (images, literature, etc.) can be found in many different public databases. Access to these data is mostly by text based methods using gene names; however, gene annotation is neither complete, nor fully systematic between organisms, and is also not generally stable over time. This provides some challenges for text based access, especially for cross-species searches. We propose a method for non-sequence data retrieval based on sequence similarity, which removes dependence on annotation and text searches. This work was motivated by the need to provide better access to large numbers of in situ images, and the observation that such image data were usually associated with a specific gene sequence. Sequence similarity searches are found in existing gene oriented databases, but mostly give indirect access to non-sequence data via navigational links.ResultsThree applications were built to explore the proposed method: accessing image data, literature and gene names. Searches are initiated with the sequence of the user's gene of interest, which is searched against a database of sequences associated with the target data. The matching (non-sequence) target data are returned directly to the user's browser, organised by sequence similarity. The method worked well for the intended application in image data management. Comparison with text based searches of the image data set showed the accuracy of the method. Applied to literature searches it facilitated retrieval of mostly high relevance references. Applied to gene name data it provided a useful analysis of name variation of related genes within and between species.ConclusionThis method makes a powerful and useful addition to existing methods for searching gene data based on text retrieval or curated gene lists. In particular the method facilitates cross-species comparisons, and enables the handling of novel or otherwise un-annotated genes. Applications using the method are quick and easy to build, and the data require little maintenance. This approach largely circumvents the need for annotation, which can be a major obstacle to the development of genomic scale data resources.

Highlights

  • Non-sequence gene data can be found in many different public databases

  • The problem is that gene annotation is a work in progress, both conceptually and for specific organisms, and significant effort has been put into this over recent years, it is clear that gene names (a) are potentially unstable, (b) can be inconsistent between organisms and (c) are not available for the many as yet unknown or novel genes, and that this is likely to remain so for some time to come

  • We believe the method will have application to almost any collection of sequence based data, and will usefully extend the available repertoire of search tools and methods. The advantages of this method of indirect sequence based retrieval are its independence of gene annotation, the ease of making cross-species comparisons, the elimination of the trial and error associated with gene name based systems, the accessibility of novel or otherwise un-annotated genes, the organisation of retrieved data in an intuitively obvious way, and the ability to build applications and quickly, with low maintenance overheads

Read more

Summary

Introduction

Non-sequence gene data (images, literature, etc.) can be found in many different public databases Access to these data is mostly by text based methods using gene names; gene annotation is neither complete, nor fully systematic between organisms, and is not generally stable over time. A survey of image data retrieval methods in existing public databases (see Table 1) showed that the mechanisms for retrieving image data by gene were almost invariably based on gene names or symbols, or parts of gene names We felt that these name based databases probably required a significant annotation or curation effort to set up, and that, in general, name based methods suffer from the following drawbacks. The incomplete state of gene annotation for Xenopus, we decided to investigate other approaches

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.