Objective knowledge about instrument manoeuvres in endovascular surgery is essential for evaluating surgical skills and developing advanced technologies for cathlab routines. To the recent day, endovascular navigation has been exclusively assessed in laboratory scenarios. By contrast, information contained in available fluoroscopy data from clinical cases has been disregarded. In this work, we pioneer a learning-based framework for motion activity recognition in fluoroscopy sequences. The architecture is composed of two networks for instrument segmentation and action recognition. In this preliminary study, we demonstrate feasibility of recognising instrument manoeuvres automatically in our ex vivo datasets.Clinical relevance-The proposed framework contributes to image-based and automated assessment of endovascular tasks. This facilitates robotic control development, surgical education, and smart clinical documentation.