Abstract. In engineering, machines are typically built after a careful conception and design process: All components of a system, their roles and the interaction between them is well understood, and often even digital models of the system exist before the actual hardware is built. This enables simulations and even feedback loops between the real-world system and a digital model, leading to a digital twin that allows better testing, prediction and understanding of complex effects. On the contrary, in Earth sciences, and particularly in ocean sciences, models exist only for certain aspects of the real world, of certain processes and of some interactions and dependencies between different “components” of the ocean. These individual models cover large temporal (seconds to millions of years) and spatial (millimetres to thousands of kilometres) scales, a variety of field data underpin them, and their results are represented in many different ways. A key to enabling digital twins in the oceans is fusion at different levels, in particular, fusion of data sources and modalities, fusion over different scales and fusion of differing representations. We outline these challenges and exemplify different envisioned digital twins employed in the oceans involving remote sensing, underwater photogrammetry and computer vision, focusing on optical aspects of the digital twinning process. In particular, we look at the holistic sensing scenarios of optical properties in coastal waters as well as seafloor dynamics at volcanic slopes and discuss road blockers for digital twins as well as potential solutions to increase and widen the use of digital twins.