Abstract Study question Is it possible to build a robust and accurate large language model, from anonymized patient records, to predict live birth occurrence? Summary answer A prototype provides answers in different languages, creates charts/tables based on natural language inputs, and runs statistical analysis, predicting live birth occurrence with 78% accuracy. What is known already The capability for a non-technical person to query clinically valid information and safely managed data represents the single largest step towards the democratization of clinical knowledge to the layperson in the history of ART. Whilst patients are used to Googling their symptoms, Chat GPT was trained on the open internet and will provide equally poor advice. However, when the information is locked down and limited to a specific data set, the ability of a LLM to parse questions as code and return answers specific to the underlying data set presents great opportunities to improve patient and clinician awareness. Study design, size, duration Design: proof of concept Size: 1 million anonymized patient outcome records were gathered from the regulator’s anonymized register (i.e., HFEA, from 1991 to 2018) Duration: between August and December 2023 Participants/materials, setting, methods Participants/materials: 1 million anonymized patient outcome records were gathered from the regulator’s anonymized register (i.e., HFEA, from 1991 to 2018) Method: Data was merged and pre-processed for analysis. As an approach to machine-learning modeling, Ensembling with Random Forest and GB was chosen. Different models were tested to reach maximum accuracy and performance. Patient scenarios based on evidence-based cases were used. A ChatGPT-like interface was preferred as it is simple for doctors and patients to interact with. Main results and the role of chance A prototype model was created, using anonymized patient outcome records, and formulated various patient scenarios based on evidence-based cases, to demonstrate new capabilities. The capabilities include the ability to provide answers in different languages for non-native English speakers, to dynamically create charts and tables based on natural language inputs, and to run sophisticated statistical analysis linked to the underlying dataset for example predict live birth occurrence based on demographic inputs,s and presentation of symptoms. The prototype went as far as creating an ML model to predict live birth occurrence with a 78% accuracy. This was achieved by using an Emsebling method for a binary classification problem (random forest) with live birth occurrence as the target and 14 features, such as age, treatment, and diagnosis. Whilst the technical nature of how this was delivered is extremely important for explainable AI and to engender trust as opposed to replacing the doctor with just another black box, the fact that a predictive model based on 1 million patients and linked back to a specific patient scenario in under 30 seconds for the end-user perspective is proof of capability that personalized insights can be quickly and accurately generated in clinical consultation. Limitations, reasons for caution The study was a proof of concept which needs further research and validation in a clinical setting. Wider implications of the findings The proof of concept evidences the capabilities of LLMs to increase awareness of fertility information. It is accessible, in 40 different languages, on-demand, free to the end-user, and clearly visualized. Access to accurate information that provides personalized insights based on specific criteria, speeds up the time to diagnosis and treatment. Trial registration number n/a
Read full abstract