Abstract Study question How strong is the agreement between embryo morphokinetic annotations performed by experienced embryologists compared to an automated embryo annotation system based on artificial intelligence (AI)? Summary answer Agreement between manual and automated annotation as determined by the interclass correlation coefficient (ICC) revealed strong or very strong agreement for all analysed morphokinetic variables. What is known already Transitioning from time-lapse imaging to embryo selection for transfer, freezing or discard involves annotation; the action of converting images to numerical data. Numerical data can be used as input to selection models quantifying embryo viability. Currently, embryos are manually annotated by the embryologist which can be subjective and time-consuming. As such, clinics prioritise a manageable number of variables to annotate, leading to a range of clinic practices. There is the additional challenge of operator variation, despite the development of standardised definitions and quality assurance schemes. AI may help resolve these challenges. Study design, size, duration Retrospective comparative analysis, including 2442 embryos from IVF and ICSI cycles, from four private fertility clinics belonging to the same group in the UK. All the embryos cultured in a time-lapse incubator (EmbryoScope,Vitrolife) between January 2016 and 2019 were included in the study. Manual annotations (MA) versus automated annotations (AA) were compared using a two-way, mixed interclass correlation coefficient (ICC), which produced five categories of agreement, very weak(0-0.20), weak(0.21-0.40), moderate(0.41-0.60), strong(0.61-0.80), very strong(0.81-1.00). Participants/materials, setting, methods Videos were manually annotated by experienced embryologists from pronuclei fading (tPNf) to time of expanded blastocyst (tEB) with all cell stages annotated in between (time to two-cell (t2), three-cell (t3), four-cell (t4), five-cell (t5), six-cell (t6), seven-cell (t7), eight-cell (t8), nine-cell (t9), morula (tM), start of blastulation (tSB) and full blastocyst (tB)). Blind to human annotations, and without any training, the same videos were annotated by CHLOE (Fairtility) to produce automated annotation data. Main results and the role of chance Of the expected annotations, AA did not provide a result for 15.4% of the MA(3235/21008). Very strong agreement(0.81-1.00) between MA and AA was found for tPNf, t2, t3, t5, t6, tM, tSB, tB, tEB. Strong agreement(0.61-0.80) was found for t4, t7, t8 and t9+. Outliers in the AA data, defined as one standard deviation from the MA, were interrogated further for five key morphokinetic parameters; t2, t5, t8, tSB and tB. A total of 269 outliers were identified. For t2 outliers(n = 14,6%), the average time difference was 5.97h(range;5.50-24.44h). All embryos with a t2 outlier were classed as either poor(PQ) or average quality(AQ). The t5 outliers(n = 45,19%) had an average time difference of 2.84h(range;9.33-36.69h). 96%(n = 43) of these embryos were classed as PQ(n = 25,56%) or AQ(n = 18,40%). Outliers for t8(138,58%) were, on average, 17.53h different between MA and AA(range;12.68-40.35h). 94%(n = 130)of these embryos were classed as PQ(n = 77,56%) or AQ(n = 53,38%). The tSB outliers(n = 28,12%) had an average time difference of 3.58h(range;0.71-14.39h). 89%(n = 25) of these embryos were classed as PQ(n = 16,57%) or AQ(n = 9,32%). Finally, outliers associated with tB(n = 44,18%) had an average time difference of 6.39h(range;0.02-33.67h). 95%(n = 42) of these embryos were classed as PQ(n = 38,86%) or AQ(n = 4,9%). Almost 15%(n = 40) of the embryos had outliers in more than one of the five morphokinetic parameters. Limitations, reasons for caution The findings for this study reflect the capabilities of a specific AI-based annotation algorithm against the practice in multiple clinics in the same group and country. The automated annotation algorithm was not trained on this dataset prior to validation, which is encouraging for generalisability. Wider implications of the findings AI is ideally suited to resolve annotation challenges. This study demonstrates that where embryo quality is poor, annotation could be skewed both when performed manually and automatically. Once robustness is demonstrated, AI tools such as CHLOE, may allow clinics to process clinical data efficiently, objectively and consistently. Trial registration number None