Abstract

This paper reviews some of the challenges posed by the huge growth of experimental data generated by the new generation of large-scale experiments at UK national facilities at the Rutherford Appleton Laboratory (RAL) site at Harwell near Oxford. Such ‘Big Scientific Data’ comes from the Diamond Light Source and Electron Microscopy Facilities, the ISIS Neutron and Muon Facility and the UK's Central Laser Facility. Increasingly, scientists are now required to use advanced machine learning and other AI technologies both to automate parts of the data pipeline and to help find new scientific discoveries in the analysis of their data. For commercially important applications, such as object recognition, natural language processing and automatic translation, deep learning has made dramatic breakthroughs. Google's DeepMind has now used the deep learning technology to develop their AlphaFold tool to make predictions for protein folding. Remarkably, it has been able to achieve some spectacular results for this specific scientific problem. Can deep learning be similarly transformative for other scientific problems? After a brief review of some initial applications of machine learning at the RAL, we focus on challenges and opportunities for AI in advancing materials science. Finally, we discuss the importance of developing some realistic machine learning benchmarks using Big Scientific Data coming from several different scientific domains. We conclude with some initial examples of our ‘scientific machine learning’ benchmark suite and of the research challenges these benchmarks will enable.This article is part of a discussion meeting issue ‘Numerical algorithms for high-performance computational science’.

Highlights

  • This paper reviews some of the challenges posed by the huge growth of experimental data generated by the new generation of large-scale experiments at UK national facilities at the Rutherford Appleton Laboratory (RAL) site at Harwell near Oxford

  • Can deep learning be transformative for other scientific problems? After a brief review of some initial applications of machine learning at the RAL, we focus on challenges and opportunities for AI in advancing materials science

  • We have given some examples of the opportunities for machine learning to play an important role both in the generation and analysis of some of these large datasets

Read more

Summary

The deep learning revolution and ‘AI for Science’

It is arguable that the deep learning revolution we are witnessing dates back to the ImageNet database and the AlexNet Deep Learning network [1]. The rapidly expanding capability of large-scale facilities to analyse material samples means that the demand for robust, automated, on-the-fly analysis is becoming ever more pressing Examples, such as the XAS studies described above, show how a fusion of experiment, simulated data and machine learning algorithms can facilitate the rapid interpretation of these rich new data sources. A new type of image recognition architecture, the mixed-scale dense MSD-NN neural network, was introduced by researchers at Berkeley Laboratory [57] This architecture has several differences from traditional CNNs. The MSD-NN uses dilation filters rather than traditional convolutional kernels, which means that longer range correlations in images can be Figure 9. In SciML, we are currently exploring the application of MSD-NNs for soft X-ray image segmentation and for a range of materials science classification problems

Big scientific data and machine learning benchmarks
Concluding remarks
Findings
22. Zooniverse Science Scribbler
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call