Context. Exoplanet observations are currently analysed with Bayesian retrieval techniques to constrain physical and chemical properties of their atmospheres. Due to the computational load of the models used to analyse said observations, a compromise is usually needed between model complexity and computing time. Analyses of observational data from future facilities, such as the James Webb Space Telescope (JWST), will require more complex models, and this will increase the computational load of retrievals, prompting the search for a faster approach for interpreting exoplanet observations. Aims. Our goal is to compare machine learning retrievals of exoplanet transmission spectra with nested sampling (Bayesian retrieval) and to understand if machine learning can be as reliable as a Bayesian retrieval for a statistically significant sample of spectra while being orders of magnitude faster. Methods. We generated grids of synthetic transmission spectra and their corresponding planetary and atmospheric parameters, with one using free chemistry models and the other using equilibrium chemistry models. Each grid was subsequently rebinned to simulate both Hubble Space Telescope, Wide Field Camera 3 (WFC3), and JWST Near-InfraRed Spectrograph observations, yielding four datasets in total. Convolutional neural networks (CNNs) were trained with each of the datasets. We performed retrievals for a set of 1000 simulated observations for each combination of model type and instrument with nested sampling and machine learning. We also used both methods to perform retrievals for real WFC3 transmission spectra of 48 exoplanets. Additionally, we carried out experiments to test how robust machine learning and nested sampling are against incorrect assumptions in our models. Results. Convolutional neural networks reached a lower coefficient of determination between predicted and true values of the parameters. Neither CNNs nor nested sampling systematically reached a lower bias for all parameters. Nested sampling underestimated the uncertainty in ~8% of retrievals, whereas CNNs correctly estimated the uncertainties. When performing retrievals for real WFC3 observations, nested sampling and machine learning agreed within 2σ for ~86% of spectra. When doing retrievals with incorrect assumptions, nested sampling underestimated the uncertainty in ~12% to ~41% of cases, whereas for the CNNs this fraction always remained below ~10%.
Read full abstract