<b>2724</b> <h3><b>Introduction:</b></h3> In developing artificial intelligence (AI) algorithms for nuclear medicine applications, several pitfalls are frequently encountered which can impede progress, lead to erroneous findings, and ultimately limit the clinical utility of algorithms. The AI Task Force of the Society of Nuclear Medicine and Molecular Imaging has identified pitfalls that commonly afflict AI algorithm development and we provide suggestions on how to best avoid them. <h3><b>Methods:</b></h3> Here we address three of the most common and detrimental pitfalls that affect AI algorithm development, including for applications within nuclear medicine: 1) exaggerated estimates of algorithm performance (reproducibility); 2) algorithms with acceptable performance in only limited populations (generalizability); and 3) algorithms that are poorly matched to the clinical need (suitability). <h3><b>Results:</b></h3> To address the challenge of poor reproducibility (i.e., the inability to replicate previous research findings), algorithms must be evaluated in datasets that are independent from the training data. Developmental datasets should be partitioned into training and holdout testing cohorts and performance measurements should be reported in the withheld test cohort. For all but large datasets, cross-validation methods such as nested cross validation should be used. Data leakage, in which information from the test set influences the model’s training, should be avoided. For hypothesis testing, statistical power analysis should be used to determine the sample size of the test cohort, and preplanned statistical analysis can help avoid “p-hacking”. Codes and models should be made publicly available and be sufficient to enable replication. The reporting of results in literature should be thorough and transparent, and we recommend the use of reporting checklists. To address the challenge of poor generalizability, developmental datasets should be collected from diverse sources, representing the anticipated variability of the real world clinical population, including images from different scanner technologies. Data samples should be collected from groups that might be vulnerable to biases and subgroup analyses should be performed. Dataset shift should be evaluated, in which the trained model is evaluated on external cohorts from different institutions. To address the challenge of poor suitability, development teams should include not just AI experts but also clinical domain experts and stakeholders, including physicians and technologists, so that the algorithm’s output can be best aligned with the clinical need. For applications in which algorithm explainability is deemed beneficial to users, algorithms should be designed so that the model’s predictions are interpretable and explainable. The confidence associated with the algorithm output should be provided whenever possible. <h3><b>Conclusions:</b></h3> The recent growth of interest in AI has led to many promising technologies in nuclear medicine but also poses several challenges, including algorithms with poor reproducibility, poor generalizability, and poor suitability for clinical tasks. By following best practices for AI algorithm development, these challenges can be overcome and developers can realize the promise of AI while avoiding the pitfalls.