Textbook question answering (TQA) is the task of correctly answering diagram or nondiagram (ND) questions given large multimodal contexts consisting of abundant essays and diagrams. In real-world scenarios, an explainable TQA system plays a key role in deepening humans' understanding of learned knowledge. However, there is no work to investigate how to provide explanations currently. To address this issue, we devise a novel architecture toward span-level eXplanations for TQA (XTQA). In this article, spans are the combinations of sentences within a paragraph. The key idea is to consider the entire textual context of a lesson as candidate evidence and then use our proposed coarse-to-fine grained explanation extracting (EE) algorithm to narrow down the evidence scope and extract the span-level explanations with varying lengths for answering different questions. The EE algorithm can also be integrated into other TQA methods to make them explainable and improve the TQA performance. Experimental results show that XTQA obtains the best overall explanation result mean intersection over union (mIoU) of 52.38% on the first 300 questions of CK12-QA test splits, demonstrating the explainability of our method (ND: 150 and diagram: 150). The results also show that XTQA achieves the best TQA performance of 36.46% and 36.95% on the aforementioned splits. We have released our code in https://github.com/dr-majie/opentqa.
Read full abstract