To enhance patient safety in healthcare, it is crucial to address the underreporting of issues in Critical Incident Reporting Systems (CIRSs). This study aims to evaluate the effectiveness of generative Artificial Intelligence and Natural Language Processing (AI/NLP) in reviewing CIRS cases by comparing its performance with human reviewers and categorising these cases into relevant topics. A case-control feasibility study was conducted using CIRS cases from the German CIRS-Anaesthesiology subsystem. Each case was reviewed by a human expert and by an AI/NLP model (ChatGPT-3.5). Two CIRS experts blindly assessed these reviews, rating them on linguistic quality, recognisable expertise, logical derivability, and overall quality using six-point Likert scales. On average, the CIRS experts correctly classified 80% of human CIRS reviews as created by a human and misclassified 45.8% of AI reviews as written by a human. Ratings on a scale of 1 (very good) to 6 (failed) revealed a comparable performance between human- and AI-generated reviews across the dimensions of linguistic expression (p = 0.39), recognisable expertise (p = 0.89), logical derivability (p = 0.84), and overall quality (p = 0.87). The AI model was able to categorise the cases into relevant topics independently. This feasibility study demonstrates the potential of generative AI/NLP in analysing and categorising cases from the CIRS. This could have implications for improving incident reporting in healthcare. Therefore, additional research is required to verify and expand upon these discoveries.