Human Object Interaction (HOI) detection can provide valuable insights into the meaning and interpretation of a painting, as the interactions between humans and objects can reveal information about the scene, characters, and story depicted in the artwork. Automatically detecting HOI in paintings is a challenging task, as the paintings often contain complex scenes with intricate details and variations in artistic style. Additionally, unlike in real-world images, the context and physics of the painting may not follow physical rules, which can further complicate the detection process. This paper proposes a novel system for detecting HOIs in paintings using multi-task learning. The system utilizes an object detection model to detect instances of human figures and objects, and extracts from them visual and spatial features. The appearance features are then combined to produce an optimized model for detecting HOIs. In order to enhance our model's performance on HOI detection, we train it in a multi-task learning setting with four different tasks. This approach allows us to leverage shared representations across multiple tasks, leading to improved accuracy and efficiency of HOI detection in our system. To train and test our model, we introduce a new benchmark for HOI detection in paintings, by augmenting the existing SemArt dataset with instance detection annotations and interaction classes and call it SemArt-HOI. Through our experiments, we show that our model is able to outperform the state-of-the-art one-stage transformer-based HOI detection model in both single-task and multi-task settings. Furthermore, our system's superior efficiency, training four times faster than the state-of-the-art model and using fewer resources, makes it ideal for practical and large-scale HOI detection in paintings.