Explainable paper classification system using topic modeling and SHAP

Nakyung Shin,Hohyun Jung,Joonhui Kim,Yulhee Lee,Heesung Moon

doi:10.3233/ida-240075

Nakyung Shin, Hohyun Jung + Show 3 more

https://doi.org/10.3233/ida-240075

Copy DOI

Export

Save

Cite

Journal: Intelligent Data Analysis

Publication Date: Aug 1, 2024

Abstract
Full-Text
Similar Papers

Abstract

Listen

The exponential growth of academic papers necessitates sophisticated classification systems to effectively manage and navigate vast information repositories. Despite the proliferation of such systems, traditional approaches often rely on embeddings that do not allow for easy interpretation of classification decisions, creating a gap in transparency and understanding. To address these challenges, we propose an innovative explainable paper classification system that combines Latent Semantic Analysis (LSA) for topic modeling with explainable artificial intelligence (XAI) techniques. Our objective is to identify which topics significantly influence the classification outcomes, incorporating Shapley additive explanations (SHAP) as a key XAI technique. Our system extracts topic assignments and word assignments from paper abstracts using latent semantic analysis (LSA) topic modeling. Topic assignments are then employed as embeddings in a multilayer perceptron (MLP) classification model, with the word assignments further utilized alongside SHAP for interpreting the classification results at the corpus, document, and word levels, enhancing interpretability and providing a clear rationale for each classification decision. We applied our model to a dataset from the Web of Science, specifically focusing on the field of nanomaterials. Our model demonstrates superior classification performance compared to several baseline models. Ultimately, our proposed model offers a significant advancement in both the performance and explainability of the system, validated by case studies that illustrate its effectiveness in real-world applications.

Full Text