Abstract

A long‐term goal in renal physiology is to understand the mechanisms involved in collecting duct function and regulation at a cellular and molecular level. The first step in modeling of these mechanisms, which can provide a guide to experimentation, is the generation of a list of model components. We have curated a list of proteins expressed in the rat renal inner medullary collecting duct (IMCD) from proteomic data from 18 different publications. The database has been posted as a public resource at https://hpcwebapps.cit.nih.gov/ ESBL/Database/IMCD_Proteome_Database/. It includes 8956 different proteins. To search the IMCD Proteomic Database efficiently, we have created a Java‐based program called curated database Basic Local Alignment Search Tool (cdbBLAST), which uses the NCBI BLAST kernel to search for specific amino acid sequences corresponding to proteins in the database. cdbBLAST reports information on the matched protein and identifies proteins in the database that have similar sequences. We have also adapted cdbBLAST to interrogate our previously published IMCD Transcriptome Database. We have made the cdbBLAST program available for use either as a web application or a downloadable .jar file at https://hpcwebapps.cit.nih.gov/ ESBL/Database/cdbBLAST/. Database searching based on protein sequence removes ambiguities arising from the standard search method based on official gene symbols and allows the user efficient identification of related proteins that may fulfill the same functional roles.

Highlights

  • With the advent of large-scale proteomic and transcriptomic experiments for profiling gene expression, data access and integration has become rate-limiting for acquisition of biological knowledge

  • We have created a downloadable GUI version, with downloadable versions of our inner medullary collecting duct (IMCD) proteome and transcriptome databases available, as well as a manual explaining how the user can create a database from their own data

  • CdbBLAST is a valuable tool that allows scientists to use the most unique search parameter when looking through a database: the amino acid/nucleotide sequence

Read more

Summary

Introduction

With the advent of large-scale proteomic and transcriptomic experiments for profiling gene expression, data access and integration has become rate-limiting for acquisition of biological knowledge. Because of the large amount of data within these types of databases, it becomes difficult to find information about a specific protein/transcript. The question becomes: what is the best way to find a particular protein/transcript within the databases? Problems arise when a database is searched using a common name for the protein/transcript in question; common or large proteins have multiple names and can be difficult to find if the user is not searching for the name used within the database. Searching using a gene symbol is a better option, it is not without its own difficulties. As with searching using a common name, a well-studied protein can be linked to multiple gene symbols.

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call