Abstract
This paper presents a framework of deep cross-modal learning networks for solving geometry problems. Existing geometry solvers either focus on single-modal problems or multi-modal problems, and they cannot fit each other. In this paper, we propose a cross-modal learning architecture for solving geometry problems, which can effectively solve both text and image-text geometry problems. To attack the representation of cross-modal features is a key challenge in understanding geometry problems that this paper adopts a shared encoder, in which the text and (or) image features are masked by using self-attention units and a multi-layer transformer is used to realize the interaction between cross-modal features. Further, we adopt a shared decoder to decode the single-modal features or the series sequence of multi-modal features according to the input of the encoder is single-modal or multi-modal problems. The representation of the decoder is transferred to task-specific heads for geometry relations extraction, theorems reasoning and geometry problem solving. The proposed algorithm can produce the solutions of geometry problems in a readable way. The experimental results shown that the proposed algorithm has good property and performance in solving geometry problems.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.