Artificial intelligence (AI) models are revolutionising scientific data analysis but are reliant on large training data sets. While artificial training data can be used in the context of NMR processing and data analysis methods, relating NMR parameters back to protein sequence and structure requires experimental data. In this perspective we examine what the biological NMR community needs to do, in order to store and share its data better so that we can make effective use of AI methods to further our understanding of biological molecules. We argue, first, that the community should be depositing much more of its experimental data. In particular, we should be depositing more spectra and dynamics data. Second, the NMR data deposited needs to capture the full information content required to be able to use and validate it adequately. The NMR Exchange Format (NEF) was designed several years ago to do this. The widespread adoption of NEF combined with a new proposal for dynamics data specifications come at the right time for the community to expand its deposition of data. Third, we highlight the importance of expanding and safeguarding our experimental data repository, the Biological Magnetic Resonance Data Bank (BMRB), not only in the interests of NMR spectroscopists, but biological scientists more widely. With this article we invite others in the biological NMR community to champion increased (possibly mandatory) data deposition, to get involved in designing new NEF specifications, and to advocate on behalf of the BMRB within the wider scientific community.
Read full abstract