PurposeTo evaluate the extent to which experienced reviewers can accurately discern between AI-generated and original research abstracts published in the field of shoulder and elbow surgery and compare this to the performance of an AI-detection tool. MethodsTwenty-five shoulder and elbow-related articles published in high-impact journals in 2023 were randomly selected. ChatGPT was prompted with only the abstract title to create an AI-generated version of each abstract. The resulting 50 abstracts were randomly distributed to and evaluated by 8 blinded peer reviewers with at least 5 years of experience. Reviewers were tasked with distinguishing between original and AI-generated text. A Likert scale assessed reviewer confidence for each interpretation and the primary reason guiding assessment of generated text was collected. AI output detector (0-100%) and plagiarism (0-100%) scores were evaluated using GPTZero. ResultsReviewers correctly identified 62% of AI-generated abstracts and misclassified 38% of original abstracts as being AI-generated. GPTZero reported a significantly higher probability of AI output among generated abstracts (median 56%, IQR 51-77%) compared to original abstracts (median 10%, IQR 4-37%; p < 0.01). Generated abstracts scored significantly lower on the plagiarism detector (median 7%, IQR 5-14%) relative to original abstracts (median 82%, IQR 72-92%; p < 0.01). Correct identification of AI-generated abstracts was predominately attributed to the presence of unrealistic data/values. The primary reason for misidentifying original abstracts as AI was attributed to writing style. ConclusionsExperienced reviewers faced difficulties in distinguishing between human and AI-generated research content within shoulder and elbow surgery. The presence of unrealistic data facilitated correct identification of AI abstracts, whereas misidentification of original abstracts was often ascribed to writing style.