Abstract The design and development of machine learning programs require selecting appropriate data and algorithms, and coding and debugging based on specific task requirements and the programming experience of developers. However, the current knowledge structure in the field of machine learning is relatively complex, lacking systematic organization, and developers often face the problem of lack of experience when choosing algorithms and designing programs, resulting in a long development cycle and easy errors in machine learning programs. In response to the above issues, this article proposes and designs a machine learning program generation algorithm based on the AORBCO model. The program generation ability includes two sub abilities: algorithm decision-making ability and code generation ability. AD-EKG has been designed for algorithmic decision-making ability, allowing Ego to select appropriate machine learning algorithms based on datasets in massive data. This algorithm combines the characteristics of the AORBCO model's domain knowledge base, knowledge graph based recommendation algorithm, and collaborative filtering algorithm. By calculating the descriptive and structural information between the dataset and algorithm, the interaction probability between the dataset and algorithm is obtained, allowing Ego to make algorithmic decisions interaction probability based. Results of the experiment have shown that the AD-EKG algorithm can fully utilize structural and descriptive information to improve the accuracy of Ego algorithm decision-making. CodeT5-EKG has been designed for code generation capability, allowing Ego to automatically generate machine learning program code. This algorithm combines the CodeT5 generative model with the domain knowledge base of the AORBCO model, by adding auxiliary information extracted using DPR technology to the code generation task, and performing diversified fusion operations to improve code generation quality. The CodeT5-EKG algorithm combines the creativity and efficiency of generative models and DPR technology, and is an algorithm that can improve the quality of generated code while also having the advantages of generative models. The experiments have proved that the code generated by this algorithm has better quality compared to other generative models with the same number of parameters.
Read full abstract