Introduction Malaria is a major public health concern, especially in developing countries. Malaria often presents with recurrent fever, malaise, and other nonspecific symptoms mistaken for influenza. Light microscopy of peripheral blood smears is considered the gold standard diagnostic test for malaria. Delays in malaria diagnosis can increase morbidity and mortality. Microscopy can be time-consuming and limited by skilled labor, infrastructure, and interobserver variability. Artificial intelligence (AI)-based tools for diagnostic screening can automate blood smear analysis without relying on a trained technician. Convolutional neural networks (CNN), deep learning neural networks that can identify visual patterns, are being explored for use in abnormality detection in medical images. A parameter that can be optimized in CNN models is the batch size or the number of images used during model training at once in one forward and backward pass. The choice of batch size in developing CNN-based malaria screening tools can affect model accuracy, training speed, and, ultimately, clinical usability. This study explores the impact of batch size on CNN model accuracy for malaria detection from thin blood smear images. Methods We used the publicly available "NIH-NLM-ThinBloodSmearsPf" dataset from the United States National Library of Medicine, consisting of blood smear images forPlasmodium falciparum. The collection consists of 13,779 "parasitized" and 13,779 "uninfected" single-cell images. We created four datasets containing all images, each with unique randomized subsets of images for model testing. Using Python, four identical 10-layer CNN models were developed and trained with varying batch sizes for 10 epochs against all datasets, resulting in 16 sets of outputs. Model prediction accuracy, training time, and F1-score,an accuracy metric used to quantify model performance, were collected. Results All models produced F1-scores of 94%-96%, with 10 of 16 instances producing F1-scores of 95%. After averaging all four dataset outputs by batch size, we observed that, as batch size increased from 16 to 128, the average combined false positives plus false negatives increased by 15.4% (130-150), and the average model F1-score accuracy decreased by 1% (95.3%-94.3%). The average training time also decreased by 28.11% (1,556-1,119 seconds). Conclusion In each dataset, we observe an approximately 1% decrease in F1-score as the batch size was increased. Clinically, a 1% deviation at the population level can create a relatively significant impact on outcomes. Results from this study suggest that smaller batch sizes could improve accuracy in models with similar layer complexity and datasets, potentially resulting in better clinical outcomes. Reduced memory requirement for training also means that model training can be achieved with more economical hardware. Our findings suggest that smaller batch sizes could be evaluated for improvements in accuracy to help develop an AI model that could screen thin blood smears for malaria.