Overcoming the limitations of PDB format

Shifali Mehta,Amardeep Singh

doi:10.24297/ijct.v2i3b.2697

Abstract

The Protein Data Bank is a repository for the 3-D structuraldata of large biological molecules, such as proteins andnucleic acid. The PDB is a key resource in the areas ofstructural biology, structural genomics. Most major scientificjournals, and some funding agencies, such as the NIH in theUSA, now require scientists to submit their structure data tothe PDB. If the contents of the PDB are thought of as primarydata, there are hundreds of derived databases that categorizethe data differently. For example, both SCOP and CATHcategorize structure according to type of structure andassumed evolutionary relations; GO categorize structuresbased on genes. In this paper, we will describe how toovercome the limitations of PDB format.

Highlights

Protein Data BankThe Protein Data Bank was established at Brookhaven National Laboratories in 1971 as an archive for biological macromolecular crystal structures
In the 1980’s the number of deposited structures began to increase dramatically. This was due to the improved technology for all aspects of the crystallographic process, the addition of structures determined by the nuclear magnetic resonance (NMR) methods, and changes in the community views about data sharing
The PDB file needs a special text processing functions for its extraction of information for usage in bioinformatics applications, there is a need to reduce the need for more text processing and have a simple framework in which the queries are simple to process and data can be represented as objects for its maximum maintainability

Summary

INTRODUCTION

The Protein Data Bank was established at Brookhaven National Laboratories in 1971 as an archive for biological macromolecular crystal structures. Date in PDB file headers show the time that the atomic coordinates were deposited, modified and published at the protein data bank, which helps keep data up-to-date Identifying these data will give users and helpful information. The PDB file needs a special text processing functions for its extraction of information for usage in bioinformatics applications, there is a need to reduce the need for more text processing and have a simple framework in which the queries are simple to process and data can be represented as objects for its maximum maintainability. LINQ can be used both against relational and object data storage, providing a bridge between them, which can be valued for projects using both technologies, or for project migration between the two It can be used as an abstraction layer, allowing to switch the underlying database technology. DB4o holds different contests allowing the community members to come up with the best suggestion for an improvement of a specific DB4o aspect, which are later on integrated into the core code

Conclusion

Future Work

REFERENCES: