BackgroundLung ultrasound can evaluate for pulmonary edema, but data suggest moderate inter-rater reliability among users. Artificial intelligence (AI) has been proposed as a model to increase the accuracy of B line interpretation. Early data suggest a benefit among more novice users, but data are limited among average residency-trained physicians. The objective of this study was to compare the accuracy of AI versus real-time physician assessment for B lines. MethodsThis was a prospective, observational study of adult Emergency Department patients presenting with suspected pulmonary edema. We excluded patients with active COVID-19 or interstitial lung disease. A physician performed thoracic ultrasound using the 12-zone technique. The physician recorded a video clip in each zone and provided an interpretation of positive (≥3 B lines or a wide, dense B line) or negative (<3 B lines and the absence of a wide, dense B line) for pulmonary edema based upon the real-time assessment. A research assistant then utilized the AI program to analyze the same saved clip to determine if it was positive versus negative for pulmonary edema. The physician sonographer was blinded to this assessment. The video clips were then reviewed independently by two expert physician sonographers (ultrasound leaders with >10,000 prior ultrasound image reviews) who were blinded to the AI and initial determinations. The experts reviewed all discordant values and reached consensus on whether the field (i.e., the area of lung between two adjacent ribs) was positive or negative using the same criteria as defined above, which served as the gold standard. Results71 patients were included in the study (56.3% female; mean BMI: 33.4 [95% CI 30.6–36.2]), with 88.3% (752/852) of lung fields being of adequate quality for assessment. Overall, 36.1% of lung fields were positive for pulmonary edema. The physician was 96.7% (95% CI 93.8%–98.5%) sensitive and 79.1% (95% CI 75.1%–82.6%) specific. The AI software was 95.6% (95% CI 92.4%–97.7%) sensitive and 64.1% (95% CI 59.8%–68.5%) specific. ConclusionBoth the physician and AI software were highly sensitive, though the physician was more specific. Future research should identify which factors are associated with increased diagnostic accuracy.