This paper presents a comparative analysis of large pretrained multilingual models for question-answering (QA) systems, with a specific focus on their adaptation to the Kazakh language. The study evaluates models including mBERT, XLM-R, mT5, AYA, and GPT, which were tested on QA tasks using the Kazakh sKQuAD dataset. To enhance model performance, fine-tuning strategies such as adapter modules, data augmentation techniques (back-translation, paraphrasing), and hyperparameter optimization were applied. Specific adjustments to learning rates, batch sizes, and training epochs were made to boost accuracy and stability. Among the models tested, mT5 achieved the highest F1 score of 75.72%, showcasing robust generalization across diverse QA tasks. GPT-4-turbo closely followed with an F1 score of 73.33%, effectively managing complex Kazakh QA scenarios. In contrast, native Kazakh models like Kaz-RoBERTa showed improvements through fine-tuning but continued to lag behind larger multilingual models, underlining the need for additional Kazakh-specific training data and further architectural enhancements. Kazakh’s agglutinative morphology and the scarcity of high-quality training data present significant challenges for model adaptation. Adapter modules helped mitigate computational costs, allowing efficient fine-tuning in resource-constrained environments without significant performance loss. Data augmentation techniques, such as back-translation and paraphrasing, were instrumental in enriching the dataset, thereby enhancing model adaptability and robustness. This study underscores the importance of advanced fine-tuning and data augmentation strategies for QA systems tailored to low-resource languages like Kazakh. By addressing these challenges, this research aims to make AI technologies more inclusive and accessible, offering practical insights for improving natural language processing (NLP) capabilities in underrepresented languages. Ultimately, these findings contribute to bridging the gap between high-resource and low-resource language models, fostering a more equitable distribution of AI solutions across diverse linguistic contexts.