IntroductionInjuries of the long head of biceps (LHB) tendon are common but difficult to diagnose clinically or using imaging. Arthroscopy is the preferred means of diagnostic assessment of the LHB, but it often proves challenging. Its reliability and reproducibility have not yet been assessed. Artificial intelligence (AI) could assist in the arthroscopic analysis of the LHB. The main objective of this study was to evaluate the inter-observer agreement for the specific LHB assessment, according to an analysis protocol based on images of interest. The secondary objective was to define a video database, called “ground truth”, intended to create and train AI for the LHB assessment. HypothesisThe hypothesis was that the inter-observer agreement analysis, on standardized images, was strong enough to allow the “ground truth” videos to be used as an input database for an AI solution to be used in making arthroscopic LHB diagnoses. Materials and methodOne hundred and ninety-nine sets of standardized arthroscopic images of LHB exploration were evaluated by 3 independent observers. Each had to characterize the healthy or pathological state of the tendon, specifying the type of lesion: partial tear, hourglass hypertrophy, instability, fissure, superior labral anterior posterior lesion (SLAP 2), chondral print and pathological pulley without instability. Inter-observer agreement levels were measured using Cohen's Kappa (K) coefficient and Kappa Accuracy. ResultsThe strength of agreement was moderate to strong according to the observers (Kappa 0.54 to 0.7 and KappaAcc from 86 to 92%), when determining the healthy or pathological state of the LHB. When the tendon was pathological, the strength of agreement was moderate to strong when it came to a partial tear (Kappa 0.49 to 0.71 and KappaAcc from 85 to 92%), fissure (Kappa −0.5 to 0.7 and KappaAcc from 36 to 93%) or a SLAP tear (0.54 to 0.88 and KappaAcc from 90 to 97%). It was low for unstable lesion (Kappa 0.04 to 0.25 and KappaAcc from 36 to 88%). ConclusionThe analysis of the LHB, from arthroscopic images, had a high level of agreement for the diagnosis of its healthy or pathological nature. However, the agreement rate decreased for the diagnosis of rare or dynamic tendon lesions. Thus, AI engineered from human analysis would have the same difficulties if it was limited only to an arthroscopic analysis. The integration of clinical and paraclinical data is necessary to improve the arthroscopic diagnosis of LHB injuries. It also seems to be an essential prerequisite for making a so-called “ground truth” database for building a high-performance AI solution. Level of evidenceIII; inter-observer prospective series.