Retrieval-augmented generation (RAG) enhances large language models (LLMs) by incorporating external data sources beyond their training sets and querying predefined knowledge bases to generate accurate, context-rich responses. Most RAG implementations use vector similarity searches, but the effectiveness of this approach and the representation of knowledge bases remain underexplored. Emerging research suggests knowledge graphs as a promising solution. Therefore, this paper presents StructuGraphRAG, which leverages document structures to inform the extraction process and constructs knowledge graphs to enhance RAG for social science research, specifically using NSDUH datasets. Our method parses document structures to extract entities and relationships, constructing comprehensive and relevant knowledge graphs. Experimental results show that StructuGraphRAG outperforms traditional RAG methods in accuracy, comprehensiveness, and contextual relevance. This approach provides a robust tool for social science researchers, facilitating precise analysis of social determinants of health and justice, and underscores the potential of structured document-informed knowledge graph construction in AI and social science research.
Read full abstract