What Is Wrong with My Model? Identifying Systematic Problems with Semantic Data Slicing

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Machine learning models make mistakes, yet sometimes it is difficult to identify the systematic problems behind the mistakes. Practitioners engage in various activities, including error analysis, testing, auditing, and red-teaming, to form hypotheses of what can go (or has gone) wrong with their models. To validate these hypotheses, practitioners employ data slicing to identify relevant examples. However, traditional data slicing is limited by available features and programmatic slicing functions. In this work, we propose SemSlicer, a framework that supports semantic data slicing, which identifies a semantically coherent slice, without the need for existing features. SemSlicer uses Large Language Models to annotate datasets and generate slices from any user-defined slicing criteria. We show that SemSlicer generates accurate slices with low cost, allows flexible trade-offs between different design dimensions, reliably identifies under-performing data slices, and helps practitioners identify useful data slices that reflect systematic problems.

Similar Papers
  • Research Article
  • 10.3390/su18010120
A Review of Environmental Quality Studies in China’s Petrochemical Port Cities Driven by a Semantic Ontology Data Model
  • Dec 22, 2025
  • Sustainability
  • Huajian Lu + 4 more

Petrochemical port cities in China face the challenge of promoting industrial development and improving environmental quality. In this situation, this paper constructs a semantic ontology-based data model from the perspective of the overall classification of environmental factors to review the environmental quality of the last three years in seven major petrochemical port cities in China. The process includes three stages. Firstly, the information sources were identified, and the research team collected and screened 1858 related papers from Web of Science and the China National Knowledge Infrastructure according to the theme of the review. Secondly, the information preprocessing was carried out, and the selected literature was sorted and filtered according to different cities and environmental elements. Finally, the research team established semantic ontology data models for the atmosphere, water, soil, biology, and acoustics environment based on the preprocessed information through visualization analysis. By using these models, the research team analyzed the hotspots of pollutants and pollution sources research in different cities in various environmental domains and summarized the main pollution mitigation measures highlighted in the research. In this way, the systematic bias and structural problem of the existing environmental study were revealed. Based on the above results, the targeted governance strategies were proposed to provide theoretical support for promoting coordinated industrial and environmental development in China’s petrochemical port cities.

Save Icon
Up Arrow
Open/Close