PDF-Driven Q&amp;A: A Research Paper

Prof Sneha R Sontakke

doi:10.22214/ijraset.2024.60840

Abstract

Abstract: Traditional methods of PDF retrieval often suffer from inefficiencies and inaccuracies due to their reliance on keyword-based search algorithms. These methods usually don't understand what words really mean and have trouble with things like different words meaning the same thing, one word meaning many things, and understanding the context. This makes the search results not very good. This project proposes a novel approach to address these shortcomings by developing a comprehensive system for efficient PDF document management and context-aware question answering. The system integrates various components, including a user-friendly interface for PDF upload, Longchain techniques for breaking down lengthy PDFs into manageable chunks, and vectorization using OpenAI's language models. These vectorized chunks are stored in the Faiss Vector Database for rapid retrieval. User queries are converted into vectors using the same language model, and a semantic search algorithm matches them against the stored PDF chunks to retrieve contextually relevant answers. By presenting these answers in a user-friendly format, the system aims to enhance accessibility and usability, enabling seamless access to pertinent information within PDF documents

Full Text