Abstract

Recent advances in gene synthesis, microfluidics, deep sequencing, and microarray techniques have made it possible to construct and assay large libraries of variant protein sequences. This rapid generation of large sets of mutational data has significantly enhanced researchers' ability to study how proteins function and to engineer proteins with new and improved properties. Although many groups around the world are currently generating large amounts of protein engineering data, there is no standardized format to report this data and no simple mechanism for groups to share the data that they generate. We have developed PEBank (Protein Engineering data Bank), a comprehensive database for protein engineering data where users can store their data as well as query and analyze data submitted by themselves and others. PEBank stores the data in a relational database using a standardized schema that requires full protein sequence information and detailed assay descriptions. These features allow for accurate comparison of measurements made across different proteins and by different groups. PEBank is comprehensive in that it accepts data for several different protein properties, including those related to stability, folding, activity, and binding. PEBank thus provides a central repository for data that is often scattered across many different specialized databases. PEBank features a web interface and REST API that streamlines data deposition and allows for batch input and queries. A suite of analysis tools are provided to allow for discovery and analysis of relationships between mutated sequences. We demonstrate the importance of a standardized format for reporting protein engineering data that allows for accurate comparisons between different data sets and enables future data mining and machine learning approaches to be applied.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.