Using norms to guide and coordinate interactions has gained tremendous attention in the multi-agent community. However, new challenges arise as the interest moves towards dynamic socio-technical systems, where human and software agents interact, and interactions are required to adapt to human's changing needs. For instance, different agents (human or software) might not have the same understanding of what it means to violate a norm (e.g., what characterizes hate speech), or their understanding of a norm might change over time (e.g., what constitutes an acceptable response time). The challenge is to address these issues by learning the meaning of a norm violation from limited interaction data. For this, we use batch and incremental learning to train an ensemble of classifiers. Ensemble learning and data-sampling handle the imbalanced class distribution of the interaction stream. At the same time, the training approaches use different strategies to ensure that the ensemble models reflect the latest community view on the meaning of norm violation. Batch learning uses weight assignment, while incremental learning continuously updates the ensemble models as community members interact. Here, we extend our previous work by creating a different balance strategy for online learning and integrating interpretability to understand norm violations. Additionally, we evaluate the proposed approaches in the context of Wikipedia article edits, where interactions revolve around editing articles, and the norm in question is prohibiting vandalism. Lastly, we conduct ablation studies to compare the ensemble's performance against a single model approach and to examine the behavior of two data sampling techniques. Results indicate that the different machine learning frameworks can learn the meaning of a norm violation in a setting with data imbalance and concept drift, although with significant differences.
Read full abstract