This paper proposes a novel Distribution-aware Dual-LLM Collaborative Framework (D2CF) for human preference prediction in large language model dialogue systems. Through data analysis in the Kaggle LMSYS Chatbot Arena competition, we innovatively selected two complementary base models: Gemma-2-9b and Llama-3.1-8b. The framework's main technical innovations include: (1) A model complementarity quantification method based on Wasserstein distance, optimizing model selection from a data distribution perspective; (2) A parameter-efficient QLoRA improvement strategy that reduced computational overhead by 42.6% through adaptive rank adjustment and quantization optimization; (3) A validation set-driven dynamic weight fusion mechanism that achieves adaptive feature fusion through attention mechanisms. In the competition evaluation, this solution achieved stable performance on both public and private test sets, ultimately winning a silver medal, validating the effectiveness and robustness of distribution-aware strategies in practical applications. The significant performance improvement from public to private test sets demonstrates the superiority of this method in handling different data distributions. This paper details the technical principles and implementation details of the solution, providing reproducible engineering practice references for human preference prediction tasks in large-scale language models.
Read full abstract