This article describes an approach to learning and evaluating an adapter model for the popular zephyr-7b-beta language model. The adapter was designed to improve the performance of the basic model in tasks related to programming and understanding the Russian language. Taking into account the high quality of the original model's work in problems in English, the purpose of the study was to expand its linguistic and technical spectrum. The proposed adapter was trained using a large and diverse dataset, including question-answer pairs related to programming, as well as a text with a discussion of the code in Russian. The applied training methodology ensures an improvement in the quality of the model's responses in understanding and generating Python code based on Russian-language instructions. The authors evaluated the performance of the basic model with the adapter installed using various metrics, comparing it with the basic model, as well as with other advanced models in this area. The results obtained by the authors showed significant improvement both in tasks related to writing Python code and in processing the Russian language, confirming the effectiveness of the proposed adapter. The analysis of the experiment results showed that the adapter not only adds new functionality to the model but also significantly changes the priority of the model's output in favor of the instructions that were included during the retraining stage. To eliminate the fact of changing the model's output priority mentioned above, the following methods are proposed: 1) Using a different format for writing the query (prompt) when formatting the dataset for training the model, different from the original format in Zephyr. 2) Including instructions from the first stage of fine-tuning (retraining) in the instruction set. 3) Creating a "mixture of experts" from several different models, where each model will specialize in the most accurate solution to one or several tasks from the overall range of solvable tasks. The methods of retraining the model without losing the original characteristics are promising directions for future research in the field of resource-efficient modification of large language models. In the conducted experiment, a large language model "HuggingFaceH4/zephyr-7b-beta" was successfully fine-tuned (using the QLoRA adapter creation method) to improve its capabilities in writing Python code based on Russian instructions and providing explanations in Russian. The testing results showed that when the created adapter is installed, the model's text generation priority shifts towards generating code with explanations. Synthetic tests also demonstrated significant improvements in the model's ability to solve programming and mathematical problems. Therefore, it is advisable to consider the possibility of changing adapters depending on the type of task being solved. Changing the adapter installed on the model (loaded into memory) does not take a significant amount of time, which allows for the creation of a set of adapters tailored to specific tasks and their interchange based on the type of tasks being solved. Changing adapters instead of models eliminates the need to load multiple full versions of the model into the memory of the computational accelerator, which, depending on the implementation method, either significantly reduces the use of memory or the model loading time. Thus, the use of adapters speeds up training and optimizes the utilization of computational resources during model operation. This presented research contributes to the current efforts in the field of machine learning and natural language processing towards creating more versatile AI models. The observed potential of adapter models in improving the quality of responses in specific domains without the need for extensive retraining helps meet the growing demand for models for multilingual natural language processing and code generation. In the conducted experiment, an assessment was made of the possibility of retraining a model that had already been trained on a set of instructions. The methods of additional training of a language model with new instructions without losing the ability to perform original instructions are not well-studied, which is why a decision was made to test the feasibility and effectiveness of such retraining. Based on the description of the LoRA method, this approach is possible because during such training, the original weights of the model remain unchanged and are only supplemented with new coefficients from the adapter model (the total number of weights equals that of the base model), which was trained on new instructions. The experiment determined that multilingual adaptation of language models is viable, and that it also could be implemented using parametrically efficient methods on comparatively low-end hardware.
Read full abstract