In the ever-evolving field of software engineering, the advent of large language models and conversational interfaces, exemplified by ChatGPT, represents a significant revolution. While their potential is evident in various domains, this paper expands upon our previous research, where we experimented with GPT–4, on its ability to create safety cases. A safety case is a structured argument supported by a body of evidence to demonstrate that a given system is safe to operate in a given environment. In this paper, we first determine GPT–4’s comprehension of the Goal Structuring Notation (GSN), a well-established notation for visually representing safety cases. Additionally, we conduct four distinct experiments using GPT–4 to evaluate its ability to generate safety cases within a specified system and application domain. To assess GPT–4’s performance in this context, we compare the results it produces with the ground-truth safety cases developed for an x-ray system, a machine learning-enabled component for tire noise recognition in a vehicle, and a lane management system from the automotive domain. This comparison enables us to gain valuable insights into the model’s generative capabilities. Our findings indicate that GPT–4 is able to generate moderately accurate and reasonable safety cases.