Promise and Peril of Collaborative Code Generation Models: Balancing Effectiveness and Memorization
In the rapidly evolving field of machine learning, training models with datasets from various locations and organizations presents significant challenges due to privacy and legal concerns. The exploration of effective collaborative training settings, which are capable of leveraging valuable knowledge from distributed and isolated datasets, is increasingly crucial. This study investigates key factors that impact the effectiveness of collaborative training methods in code next-token prediction, as well as the correctness and utility of the generated code, showing the promise of such methods. Additionally, we evaluate the memorization of different participant training data across various collaborative training settings, including centralized, federated, and incremental training, showing their potential risks in leaking data.
- Conference Article
13
- 10.1109/icebe.2006.94
- Jan 1, 2006
This paper extends the ebXML Web services development framework to introduce a new collaboration framework, which is based on the collaboration specification language PSML-C (Process Specification and Modeling Language for Collaboration), CCSOA (Consumer-Centric Service-Oriented Architecture), and DDSOS (Dynamic Distributed Service-Oriented Simulation) framework. Since collaborations are inevitable and play a critical role in SOA, an effective framework will greatly reduce the effort for rapid and adaptive service composition, simulation, evaluation, and collaboration. The PSML-C collaboration framework provides a service-oriented infrastructure for process collaboration specification, modeling, design, code generation, simulation, deployment, execution, and management. This paper presents the concepts, architecture, enabling techniques, and illustrative examples that demonstrate the concepts and the techniques
- Research Article
75
- 10.1016/j.jss.2009.06.057
- Jul 11, 2009
- Journal of Systems and Software
Tool support for the rapid composition, analysis and implementation of reactive services
- Book Chapter
7
- 10.1016/b978-0-444-63428-3.50136-3
- Jan 1, 2016
- Computer Aided Chemical Engineering
Taylor-Made Modeling and Solution of Novel Process Units by Modular CAPE-OPEN-based Flowsheeting
- Research Article
- 10.54097/qkapa469
- Dec 30, 2025
- International Journal of Energy
Aiming at the cost fluctuation and demand-side management pressure faced by iron and steel enterprises under the background of power market reform, this paper puts forward a collaborative optimization model of energy storage and blast furnace surplus gas power generation, which integrates the power spot price signal and the maximum demand (MD) constraint. This model employs a multi-timescale mixed-integer nonlinear programming (MINLP) framework, integrating rolling optimization strategies and robust optimization methods. It prioritizes minimizing the enterprise's total electricity costs while simultaneously accounting for grid power purchase costs, MD penalty costs, coal-gas power generation operational costs, and energy storage operational costs. Through case analysis, it is verified that the model has obvious advantages in balancing the penalty of power purchase cost and demand and realizing global optimization. The results show that, compared with the traditional operation mode and the scenario of only considering the time-of-use price optimization, the collaborative optimization model can effectively reduce the total power consumption cost of enterprises and improve the economy and power grid security. In addition, the robustness test of the model further proves its effectiveness in dealing with the fluctuation of electricity price. This study provides a feasible solution for iron and steel enterprises in the process of energy structure transformation and energy efficiency improvement.
- Research Article
6
- 10.1016/j.egyr.2023.04.207
- Apr 20, 2023
- Energy Reports
Distributed collaborative optimization for coupled transportation and power systems operation considering carbon emission and elastic travel demand
- Research Article
1
- 10.1007/s43441-025-00820-z
- Jun 5, 2025
- Therapeutic innovation & regulatory science
Real-world data (RWD) are increasingly recognized as critical to advancing drug development and health care delivery, with regulatory bodies increasingly recognising their value. However, stringent governance requirements, while essential for protecting patient privacy, create significant challenges for conducting research. The Cancer Analysis System (CAS), managed by National Health Service (NHS) England, includes a national cancer registry and linked health care datasets. To address data access challenges, Simulacrum, a set of publicly available synthetic datasets generated from the CAS, can be used to carry out preliminary data analysis, hypothesis generation and development of programming code that can be executed to run analyses on CAS data. This paper presents a collaborative operating model that leverages Simulacrum to enable efficient, privacy-compliant analytics. Analysis of 18 projects conducted using this model demonstrated an average duration of 2.3 months from the start of Code Development to Data Release (CDDR). By enabling researchers to conduct privacy-compliant analysis on synthetic data, this approach increases transparency by providing insights into patient-level data while reduces reliance on custodians of sensitive data. Our findings highlight how synthetic data can be leveraged to facilitate efficient research on restricted patient-level RWD, while safeguarding patient privacy. This framework offers a scalable solution for other data custodians that can enable broader use of RWD, accelerating healthcare innovation.
- Research Article
27
- 10.1038/gim.2017.122
- Aug 10, 2017
- Genetics in Medicine
A proposed approach to accelerate evidence generation for genomic-based technologies in the context of a learning health system.
- Research Article
74
- 10.22323/2.09010205
- Mar 22, 2010
- Journal of Science Communication
From contributions of astronomy data and DNA sequences to disease treatment research, scientific activity by non-scientists is a real and emergent phenomenon, and raising policy questions. This involvement in science can be understood as an issue of access to publications, code, and data that facilitates public engagement in the research process, thus appropriate policy to support the associated welfare enhancing benefits is essential. Current legal barriers to citizen participation can be alleviated by scientists’ use of the “Reproducible Research Standard,” thus making the literature, data, and code associated with scientific results accessible. The enterprise of science is undergoing deep and fundamental changes, particularly in how scientists obtain results and share their work: the promise of open research dissemination held by the Internet is gradually being fulfilled by scientists. Contributions to science from beyond the ivory tower are forcing a rethinking of traditional models of knowledge generation, evaluation, and communication. The notion of a scientific “peer” is blurred with the advent of lay contributions to science raising questions regarding the concepts of peer-review and recognition. New collaborative models are emerging around both open scientific software and the generation of scientific discoveries that bear a similarity to open innovation models in other settings. Public engagement in science can be understood as an issue of access to knowledge for public involvement in the research process, facilitated by appropriate policy to support the welfare enhancing benefits deriving from citizen-science.
- Research Article
- 10.3390/electronics15010015
- Dec 19, 2025
- Electronics
Bias and hallucinations in low-resource cultural artefacts significantly impede text-to-image generation models from understanding and disseminating. Focusing on Tibetan as a Chinese minority culture, we produced a children’s picture book through two methods: AI generation and human illustrator. Eye-tracking experiments were employed to investigate participants’ implicit attitudes, aesthetic biases, and cultural perceptions towards these two sources. The results revealed that (1) the hand-drawn group demonstrated higher fidelity to Tibetan culture, exhibiting a positive aesthetic calibration effect in terms of cultural adaptability owing to viewers’ attention duration to the cultural symbols details. (2) The AI-generated group elicited greater viewer interest and emotional engagement through its asymmetric color palettes, especially in color richness and stylistic rendering, and achieved professional-level compositional maturity in multi-character scene generation. This study provides empirical evidence to inform the division of labor between humans and AI in children’s book illustration and explores potential models for future human-AI collaboration.
- Research Article
- 10.3390/en18205437
- Oct 15, 2025
- Energies
To reduce the renewable energy waste and carbon emissions predicted for the current expansion plan, this study proposes a hierarchical collaborative optimization model for the planning of generation and transmission expansion plan in cross-regional power systems considering energy storage and load transfer. In the upper layer, the upper limit of expansion is determined according to China’s current policy and expansion plan for the power system. This level completes the annual power expansion plan and provides scale data of power generation facilities and supporting infrastructures for the lower level. The lower layer is the operation level, which simulates the operation of the power system throughout the year. To find the defects of the current plan and provide an optimization scheme, the optimization model is used to analyze China’s power system in 2030. The utilization of renewable energy and power facilities is analyzed, along with the carbon emissions. An improved power expansion plan that comprehensively considers energy storage, transmission and load transfer for China’s carbon peak is proposed. The proposed scheme increases the utilization rate of renewable energy to 97.058%, reduces CO2 emissions by 224 million tons, and reduces the installed capacity of thermal power by about 18.686 million kilowatts, verifying the effectiveness of the scheme.
- Research Article
3
- 10.1155/2024/4734030
- Jan 1, 2024
- International Journal of Intelligent Systems
Intelligent traffic signal systems, crucial for intelligent transportation systems, have been widely studied and deployed to enhance vehicle traffic efficiency and reduce air pollution. Unfortunately, intelligent traffic signal systems are at risk of data spoofing attack, causing traffic delays, congestion, and even paralysis. In this paper, we reveal a multivehicle collaborative data spoofing attack to intelligent traffic signal systems and propose a collaborative attack sequence generation model based on multiagent reinforcement learning (RL), aiming to explore efficient and stealthy attacks. Specifically, we first model the spoofing attack based on Partially Observable Markov Decision Process (POMDP) at single and multiple intersections. This involves constructing the state space, action space, and defining a reward function for the attack. Then, based on the attack modeling, we propose an automated approach for generating collaborative attack sequences using the Multi‐Actor‐Attention‐Critic (MAAC) algorithm, a mainstream multiagent RL algorithm. Experiments conducted on the multimodal traffic simulation (VISSIM) platform demonstrate a 15% increase in delay time (DT) and a 40% reduction in attack ratio (AR) compared to the single‐vehicle attack, confirming the effectiveness and stealthiness of our collaborative attack.
- Conference Article
4
- 10.1109/ei2.2018.8582660
- Oct 1, 2018
The research on the collaborative scheduling of various energy Internet networks in the future is a complex system engineering. With the rapid development of the energy Internet, the joint application of distributed generation and electric vehicles will penetrate into the existing power grid in the future, and the existing power grid structure will be greatly affected. Based on the research of dynamic spatiotemporal characteristics of different types of electric vehicles, this paper proposes a fast charge station probabilistic load modeling technology route; and proposes an optimal charging control strategy that takes into account the complex interaction between power network and traffic network; Based on the modeling of main renewable energy (wind energy, solar energy), the collaborative scheduling model of electric vehicle, power system and new power generation is studied, which promotes the mutual penetration and deep integration of energy Internet and intelligent management decision-making.
- Research Article
- 10.13052/spee1048-5236.44413
- Oct 31, 2025
- Strategic Planning for Energy and the Environment
The power system faces huge challenges in reducing carbon emissions and improving economic benefits. The traditional electricity price collaborative optimization model cannot fully combine the synergy of carbon capture technology and the tiered electricity price mechanism. There is an urgent need to propose a new low-carbon optimization model to cope with energy transformation. And sustainable development goals requirements. The power system structure is mapped through a multi-energy coupled digital twin system to achieve dynamic perception and modelling of power generation, load, and carbon emission processes. Construct a response mechanism driven by carbon-electricity collaboration, combine multi-modal data fusion technology, and use the LSTM-CNN deep neural network to mine the collaborative rules among carbon capture devices, electricity markets, and user behaviour. In terms of optimization algorithms, a dual-objective reinforcement learning model based on a deep Q network (DQN) is proposed to find a dynamic balance between economy and low carbon and integrate mixed integer programming methods to deal with complex system constraints to improve solution efficiency and feasibility. In the synergy model of carbon capture and tiered electricity price, when the carbon capture efficiency reaches 45.67%, the carbon emission intensity of high-carbon units drops from 91.12 g/kWh to 62.34 g/kWh, a decrease of 31.5%. When the coverage rate of the secondtiered electricity price is 23.89%, the peak load of industrial users is reduced by 17.56%, and the peak load is increased by 100% × 34.23% (relative to the benchmark). The collaborative strategy enabled the system’s comprehensive carbon emission reduction rate to reach 34.23%, which was 23.1% higher than that of single carbon capture, verifying the coupling and efficiency of price signals and technical means.
- Conference Article
- 10.1109/sibgrapi55357.2022.9991773
- Oct 24, 2022
Collaborative Filtering stands as an underlying strategy to reasonably deal with large-scale problems like scalability and high sparsity. In the classifier fusion context, one could benefit from adopting such a strategy to learn decision templates effectively for the sake of computation efficiency. This paper introduces a framework that explores collaborative filtering-based latent factors models for fast decision template generation, assuming it has a sparse matrix structure. Experiments conducted over five general-purpose public datasets and statistically assessed have demonstrated its feasibility for building decision templates under low sparsity conditions and datasets labeled with fewer classes. Under such conditions, the proposed framework showed competitive recognition rates, significantly reducing computational costs, particularly when distance-based classifiers are employed for ensemble learning purposes.
- Research Article
9
- 10.1177/154193121005402002
- Sep 1, 2010
- Proceedings of the Human Factors and Ergonomics Society Annual Meeting
Employing human factors and user-centered systems engineering methodology and design principles to the development of smart cities has the potential of establishing a novel field of research. This paper introduces a novel human factors knowledge management framework for collaborative education, design and modeling of the next generation of smarter cities. A conceptual framework and practical applications of systems engineering approaches to support smarter cities development is proposed. The human systems component in collaborative systems engineering aims to ensure that human considerations for learners and designers have a prominent place in the integrated design and development of sustainable, smarter cities throughout the total system lifecycle. Future challenges that collaborative human and systems engineering techniques are likely to face in this domain are also discussed.