Diagnostic Accuracy of Utilizing Artificial Intelligence for Malaria Diagnostic: A Systematic Review and Meta-Analysis

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Background: Malaria remains a major public health concern around the world. Microscopic blood smear examination continues to be the gold standard for diagnosis; however, it requires high technical skills and expertise, limiting diagnostic accuracy in resource-poor settings. Artificial intelligence (AI) has emerged as a promising tool to support malaria detection. This systematic review provides an overview of the diagnostic performance of AI-based systems for malaria diagnosis in a clinical setting. Methods: This study followed the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) guidelines and involved articles within the last 10 years that were collected from PubMed, ScienceDirect, Cochrane, EBSCO, and Wiley Online Library. Original articles that reported AI diagnostic accuracy with external validation were involved. The quality of each study was evaluated using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2). Results: Ten studies with 6754 patients were analyzed. Pooled results of sensitivity [87.7% (95% CI: 78.2–93.4)] and specificity [91.4% (95% CI: 77.3–97.1)] revealed how much the AI agrees with each method when that method is used as a gold standard. Additionally, AI achieved a sensitivity of 87.7% and a specificity of 91.4% compared to microscopy examination and a sensitivity of 90.7% and a specificity of 88.3% compared to polymerase chain reaction (PCR). Conclusions: AI-based systems improve malaria diagnosis by providing high accuracy, automation, and lower costs. Showing performance comparable to reference methods such as microscopy and PCR, AI is a promising complementary tool for malaria control.

Similar Papers
  • Research Article
  • Cite Count Icon 1
  • 10.1097/corr.0000000000003660
Are Artificial Intelligence Models Reliable for Clinical Application in Pediatric Fracture Detection on Radiographs? A Systematic Review and Meta-analysis.
  • Aug 20, 2025
  • Clinical orthopaedics and related research
  • Gabriel Fontenele Ximenes + 6 more

Artificial intelligence (AI) applications for pediatric fracture diagnosis using radiographs have demonstrated growing potential in clinical settings. Despite this growing potential, existing studies are limited by small sample sizes, variability in their diagnostic metrics, and inconsistent use of external validation, which reduces confidence in their findings. These limitations hinder the assessment of real-world performance. A meta-analysis would help address these gaps by pooling data to generate more robust, generalizable estimates for clinical application and future guidance. (1) What is the pooled diagnostic performance of AI models, including sensitivity, specificity, and area under the curve (AUC), for detecting pediatric fractures on radiographs? (2) What is the clinical applicability of AI models, as determined by whether their diagnostic performance is sustained in studies that employed external validation? (3) How does anatomic coverage influence the diagnostic performance of AI models? This meta-analysis adhered to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 guidelines and was registered in PROSPERO (CRD42024628342). A systematic search of PubMed/MEDLINE, Embase, and the Cochrane Library was conducted from database inception through December 9, 2024. A total of 497 records were identified. Eligible studies included pediatric patients with suspected fractures evaluated by AI models on radiographs. Studies were excluded if they lacked sufficient data to calculate sensitivity, specificity, or AUC; if they combined adult and pediatric populations; or if they focused on rib fractures. Sixteen diagnostic accuracy studies were included, involving 10,203 pediatric patients with a mean age of 8.85 years, 54% of whom were male, and 21,789 radiographs, of which 5882 confirmed fractures. Data extraction followed the Population, Index test, Target condition (PIT) framework and was performed independently by two reviewers. The risk of bias was assessed using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool, which evaluates four domains (patient selection, index test, reference standard, and flow/timing) for low, high, or unclear risk. Most studies exhibited low to moderate risk of bias. Certainty of evidence was evaluated using the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) approach, which classifies evidence as high, moderate, low, or very low, and in this study demonstrated high certainty of evidence. Heterogeneity in the pooled estimates was moderate for sensitivity (I 2 = 61%) and high for specificity (I 2 = 90%). No evidence of publication bias was detected based on Egger test (p = 0.54) and funnel plot symmetry. Meta-analyses used logit transformation and bivariate modeling to estimate pooled sensitivity, specificity, and AUC. The pooled analysis demonstrated a sensitivity of 93% (95% confidence interval [CI] 92% to 94%), a specificity of 91% (95% CI 88% to 93%), and an AUC of 0.96 (95% CI 0.92 to 0.97). The AUC reflects the overall ability of a model to distinguish between patients with and without fractures, with values closer to 1.0 indicating better diagnostic performance. When evaluated on external data sets, AI models maintained high diagnostic accuracy, with a sensitivity of 93% (95% CI 90% to 95%), specificity of 88% (95% CI 84% to 91%), and an AUC of 0.95 (95% CI 0.89 to 0.97), supporting their potential for clinical applicability. Anatomic coverage by specific region made a meaningful contribution to explaining the observed heterogeneity. Models evaluating multiple regions showed slightly higher sensitivity, while those focused on single regions demonstrated better specificity, suggesting that a broader anatomic scope may improve fracture detection but slightly reduce accuracy in ruling out false positives. This meta-analysis demonstrates that AI models can accurately detect pediatric fractures on radiographs, a finding that withstood scrutiny in studies that included external validation. These findings suggest that orthopaedic surgeons and emergency physicians can consider incorporating validated convolutional neural network algorithms into workflows to enhance diagnostic accuracy, especially in acute care settings where rapid and accurate decision-making is critical. Nevertheless, future research is needed to investigate performance across specific subgroups, including sex and anatomic regions. Paired-design diagnostic accuracy studies with external geographic validation remain the most appropriate method to assess their real-world value. Such validation should be prioritized as a prerequisite for clinical generalization and democratization of AI models, even before randomized trials or prospective implementation studies. Level III, diagnostic study.

  • Research Article
  • 10.11124/01938924-201109641-00011
The accuracy of Influenza A (H1N1) “swine flu” laboratory testing: A systematic review of diagnostic test accuracy.
  • Jan 1, 2011
  • JBI Database of Systematic Reviews and Implementation Reports
  • Sarahlouise White + 2 more

The accuracy of Influenza A (H1N1) “swine flu” laboratory testing: A systematic review of diagnostic test accuracy.

  • Research Article
  • 10.11124/jbisrir-2010-847
The accuracy of Influenza A (H1N1) “swine flu” laboratory testing: A systematic review of diagnostic test accuracy.
  • Jan 1, 2010
  • JBI Library of Systematic Reviews
  • Sarahlouise White + 2 more

The accuracy of Influenza A (H1N1) “swine flu” laboratory testing: A systematic review of diagnostic test accuracy.

  • Front Matter
  • Cite Count Icon 3
  • 10.1097/corr.0000000000001708
Editorial: What Readers and Clinician Scientists Need to Know About the "Other" EQUATOR.
  • Mar 11, 2021
  • Clinical Orthopaedics & Related Research
  • Seth S Leopold + 1 more

Editorial: What Readers and Clinician Scientists Need to Know About the "Other" EQUATOR.

  • Research Article
  • Cite Count Icon 3
  • 10.1016/j.annemergmed.2010.02.008
The Conduct and Reporting of Meta-Analyses of Studies of Diagnostic Tests, and a Consideration of ROC Curves: Answers to the January 2010 Journal Club Questions
  • May 21, 2010
  • Annals of Emergency Medicine
  • Teri A Reynolds + 1 more

The Conduct and Reporting of Meta-Analyses of Studies of Diagnostic Tests, and a Consideration of ROC Curves: Answers to the January 2010 Journal Club Questions

  • Research Article
  • 10.7759/cureus.95484
Comparative Effectiveness of Artificial Intelligence Versus Conventional Methods for Detecting Peritoneal Metastasis in Colorectal Cancer: A Systematic Review.
  • Oct 27, 2025
  • Cureus
  • Mohamed Elsaigh + 7 more

Colorectal cancerrepresents a major global malignancy and a leading cause of cancer-related death. Peritoneal metastasis occurs in a significant proportion of colorectal cancer patients and is associated with markedly worse prognosis compared to other metastatic sites, with a limited median overall survival. Early detection remains challenging due to the limited sensitivity of conventional imaging techniques, with computed tomography exhibiting poor detection rates for small lesions and necessitating invasive diagnostic procedures for accurate diagnosis. The limitations of traditional diagnostic modalities have driven a growing interest in artificial intelligence applications to advance the early, non-invasive detection of peritoneal metastasis. This study aimed to systematically assess whether artificial intelligence and machine learning approaches enhance the accuracy and efficiency of detecting peritoneal metastasis and predicting tumor spread patterns compared to conventional imaging and clinical assessment methods in patients with colorectal cancer. A systematic review was conducted in accordance with Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, searching the PubMed, Web of Science, Cochrane, Embase, and Scopus databases for studies published between 2015 and 2025. The search strategy included comprehensive terminology related to artificial intelligence and machine learning, combined with terms related to peritoneal metastasis. Two independent reviewers assessed study quality using the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool for diagnostic accuracy studies and the modified Radiomics Quality Score for artificial intelligence (AI) and radiomics studies, with disagreements resolved through consensus discussions. From multiple countries, 22studies wereincluded with a total population of over 40,000 patients.AI applications consistently outperformed traditional methods across all modalities. While conventional approaches showed moderate performance with C-indices of 0.73-0.85 and CT imaging missed 89% of small lesions, AI-assisted systems demonstrated superior results as follows: cytological detection achieved over 95% accuracy and 99% specificity; radiomics models reached AUCs up to 0.941; circulating tumor DNA integration provided 8.5-fold increased risk identification; and computer-assisted staging laparoscopy improved surgical diagnostic accuracy from 52% to 79% compared to human assessment alone. AI technologies demonstrate promising advantages for peritoneal metastasis detection, offering enhanced diagnostic accuracy, objective assessments, faster analysis, and improved clinical decision-making, particularly through human-AI collaboration. However, most studies lack external validation across diverse populations and real-world settings, while current implementations face significant workflow challenges. Before clinical adoption, future research must prioritize large-scale prospective validation studies, external validation across diverse populations, and comprehensive cost-effectiveness analyses to ensure safe and effective integration into clinical practice.

  • Research Article
  • Cite Count Icon 1
  • 10.1001/jamanetworkopen.2025.33512
Use of AI in Identification of Sexually Transmitted Infections and Anogenital Dermatoses
  • Oct 3, 2025
  • JAMA Network Open
  • Nyi Nyi Soe + 11 more

Artificial intelligence (AI) excels in dermatology. However, its applications to sexually transmitted infections (STIs) remain unclear. To assess the performance of AI algorithms and their applications in detecting STIs and anogenital dermatoses from clinical images in sexual health. Six databases (IEEE Xplore, Embase, Scopus, Medline, Web of Science, and CINAHL) were searched for studies published from January 1, 2010, to April 12, 2024, using 3 main concepts: artificial intelligence, diagnosis, and sexually transmitted infections. Studies that used AI to identify anogenital skin conditions from clinical images were included. Studies that used non-AI approaches or nonanogenital conditions, as well as reviews and studies lacking performance metrics, were excluded. Following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, 2 reviewers independently assessed full-text articles and extracted data using a standardized spreadsheet. Another 2 reviewers resolved any disagreements. A modified Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) critical appraisal tool and the Checklist for Evaluation of Image-Based AI Reports in Dermatology (CLEAR Derm) were used for quality assessment. Pooled sensitivity and specificity of AI applications for detecting anogenital skin conditions. A bivariate random-effects meta-analysis was conducted for conditions with more than 3 studies. Of 5381 studies screened and 258 full texts selected, 140 met the inclusion criteria. Most studies reported on mpox (110 [78.6%]), while other anogenital conditions, including genital herpes (7 [5.0%]), genital warts (8 [5.7%]), scabies (8 [5.7%]), and molluscum contagiosum (6 [4.3%]), received less attention. Meta-analyses showed high performance of AI for identification of mpox (pooled sensitivity: 0.96 [95% CI, 0.93-0.97]; pooled specificity: 0.98 [95% CI, 0.97-0.99]), herpes simplex (sensitivity: 0.91 [95% CI, 0.71-0.98]; specificity: 0.97 [95% CI, 0.94-0.98]), genital warts (sensitivity: 0.87 [95% CI, 0.67-0.96]; specificity: 0.98 [95% CI, 0.95-0.99]), psoriasis (sensitivity: 0.90 [95% CI, 0.78-0.95]; specificity: 0.98 [95% CI, 0.96-0.99]), and scabies (sensitivity: 0.89 [95% CI, 0.84-0.93]; specificity: 0.98 [95% CI, 0.95-0.99]). Study quality was variable, and the assessment identified high risk of bias across the population selection (76.1%), reference standards (76.1%), and index tests (20.0%). Most studies relied on open-source datasets (121 [86.4%]); only 17 (12.1%) used external validation. All but 1 study (0.7%) remained at the proof-of-concept stage, and models were not publicly available for external evaluation. The findings suggest that AI shows promise in identifying STIs and anogenital dermatoses but that significant research gaps exist. Future work should prioritize understudied STIs and differential conditions while improving data quality, conducting external validation, and validating findings in clinical settings.

  • Research Article
  • 10.1016/j.ajog.2025.05.004
Diagnostic accuracy of cell-free DNA for the determination of fetal red blood cell antigen genotype: a systematic review and meta-analysis.
  • Nov 1, 2025
  • American journal of obstetrics and gynecology
  • Hiba J Mustafa + 6 more

Diagnostic accuracy of cell-free DNA for the determination of fetal red blood cell antigen genotype: a systematic review and meta-analysis.

  • Front Matter
  • Cite Count Icon 29
  • 10.4269/ajtmh.2012.11-0619
How Do We Best Diagnose Malaria in Africa?
  • Feb 1, 2012
  • The American Journal of Tropical Medicine and Hygiene
  • Philip J Rosenthal

How Do We Best Diagnose Malaria in Africa?

  • Research Article
  • 10.22146/tmj.5869
Validity of p-LDH/HRP2-Based Rapid Diagnostic Test for the Diagnosis of Malaria on Pregnant Women in Maluku
  • Feb 25, 2015
  • Vebiyanti Vebiyanti + 2 more

Introduction: Pregnant women are one of the groups at risk for infection by the malaria parasites in endemic areas. The dangerous impacts of malaria in pregnancy are anemia and severe malaria that can cause death for mother, fetus and newborn. Clinical symptoms that are likely to be not typical until asymptomatic in pregnancy are one of the obstacles on diagnosing malaria in pregnancy in endemic areas. p-LDH/HRP2-RDT ( Pf/Pan) is one of the WHO recommended RDT product on round 1-4 and has been used in Maluku. This tool is able to detect antigens of the Plasmodium metabolism results in peripheral blood so that it is regarded to be more sensitive than microscopic examination. The use of p-LDH and HRP2-RDT ( Pf/Pan ) for the detection of P. falciparum HRP-2 antigen and P. vivax , P.malariae, P.ovale p-LDH antigen have not been previously evaluated in the Province of Maluku. Objectives: To evaluate the validity of p-LDH/HRP2-RDT ( Pf/Pan) compared with microscopic examination and nested Polymerase Chain Reaction (PCR) as the gold standard for the diagnosis of malaria in pregnancy in Maluku. Methods : This was a cross-sectional study using a diagnostic test of malaria in pregnant women. The study was conducted in Ambon City health center, Savana Jaya Buru Island health center and Haulussy Ambon Local Hospital. Sample data, the data of pregnancy, RDT results and microscopic results on the field were recorded in the questionnaire. Nested PCR examination was conducted at the Laboratory of Parasitology, Faculty of Medicine, Universitas Gadjah Mada as well as second reading for microscopic examination Results : The results showed that p-LDH/HRP2-RDT ( Pf/Pan) had the same sensitivity with micoscopic of 11%, a specificity of 100% higher than microscopic 96% compared with nested PCR as the gold standard, p-LDH/HRP2-RDT ( Pf/Pan) had PPV and NPV of 100% and 98% compared with nested PCR as the gold standard. p-LDH/HRP2-RDT ( Pf/Pan) sensitivity was 80% compared to the microscopic examination. Conclusion : diagnostic malaria in pregnancy in Maluku with p-LDH/HRP2-RDT ( Pf/Pan) was less sensitive than nested PCR and microscopic. Keywords: Malaria, pregnant woman, diagnostic test, validity, p-LDH/HRP2 Rapid Diagnostic Test (RDT) ( Pf/Pan)

  • Research Article
  • Cite Count Icon 30
  • 10.1016/j.ejmp.2021.03.015
Performance of an artificial intelligence tool with real-time clinical workflow integration - Detection of intracranial hemorrhage and pulmonary embolism.
  • Mar 1, 2021
  • Physica Medica
  • Nico Buls + 4 more

Performance of an artificial intelligence tool with real-time clinical workflow integration - Detection of intracranial hemorrhage and pulmonary embolism.

  • Research Article
  • Cite Count Icon 38
  • 10.1016/j.jacr.2021.11.008
Independent External Validation of Artificial Intelligence Algorithms for Automated Interpretation of Screening Mammography: A Systematic Review.
  • Feb 1, 2022
  • Journal of the American College of Radiology
  • Anna W Anderson + 7 more

Independent External Validation of Artificial Intelligence Algorithms for Automated Interpretation of Screening Mammography: A Systematic Review.

  • Research Article
  • Cite Count Icon 42
  • 10.1016/j.fertnstert.2020.10.040
Predictive modeling in reproductive medicine: Where will the future of artificial intelligence research take us?
  • Nov 1, 2020
  • Fertility and Sterility
  • Carol Lynn Curchoe + 18 more

Predictive modeling in reproductive medicine: Where will the future of artificial intelligence research take us?

  • Supplementary Content
  • Cite Count Icon 7
  • 10.1111/tmi.13193
Urinary circulating DNA and circulating antigen for diagnosis of schistosomiasis mansoni: a field study.
  • Jan 8, 2019
  • Tropical Medicine & International Health
  • Radwa Galal Diab + 3 more

To evaluate three non-invasive assays for the diagnosis of schistosomiasis mansoni in an Egyptian village. Urine was collected for the detection of circulating cathodic antigen (CCA) and cell-free parasite DNA (cfpd) by Point-of-contact (POC)-cassette assay and PCR, respectively. These tests were compared to Kato-Katz (KK) faecal thick smear for detection of Schistosoma mansoni eggs. Disease prevalence by POC-CCA assay was 86%; by PCR it was 39% vs. 27% by KK. Compared to KK, the sensitivity of POC-CCA reached 100%, but its specificity was only 19.2% with 41% accuracy. Sensitivity of the PCR assay for cfpd was 55.56%, and specificity was 67.12% with 64% accuracy. A new end point was calculated for combined analysis of KK, POC-CCA assay and PCR. Sensitivity for the three tests was 52.94%, 90.2% and 76.47%; specificity was 100% for KK and PCR and 18.37% for POC-CCA. The accuracy calculated for the three tests at the end point was 76% for KK, 55% for POC-CCA assay and 88% for PCR. Conventional PCR assay for detection of cfpd provides a potential screening tool for intestinal schistosomiasis with reliable specificity, reasonable accuracy and affordable financial and technical cost.

  • Supplementary Content
  • 10.7759/cureus.96155
Advanced Imaging Modalities in Gastrointestinal Endoscopy: A Systematic Review of Diagnostic Accuracy and Clinical Impact
  • Nov 5, 2025
  • Cureus
  • Anas E Ahmed + 9 more

Conventional white light endoscopy (WLE) has limited sensitivity for detecting subtle gastrointestinal lesions, leading to missed diagnoses and interval cancers. Advanced imaging modalities, including narrow band imaging (NBI), linked color imaging (LCI), blue laser imaging (BLI), autofluorescence imaging (AFI), and artificial intelligence (AI)-based systems, have been developed to enhance lesion detection, characterization, and diagnostic confidence. This systematic review, conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, evaluated the diagnostic accuracy and clinical impact of these technologies compared with WLE and other standard approaches. Comprehensive searches of PubMed, Cochrane Central Register of Controlled Trials (CENTRAL), Scopus, and Web of Science (from inception to September 2025) identified 14,288 records, of which 10,590 unique studies were screened, and 21 met the inclusion criteria. NBI demonstrated moderate-to-high diagnostic accuracy but offered limited improvement over high-definition WLE in adenoma detection. LCI improved sensitivity and lesion visibility across gastric, esophageal, and inflammatory disorders, showing strong performance in early gastric cancer and reflux esophagitis. BLI and BLI-bright enhanced real-time gastric cancer detection and supported colorectal optical diagnosis comparable to NBI. AFI showed inferior performance to dye-based chromoendoscopy in ulcerative colitis surveillance but effectively distinguished reflux phenotypes with high accuracy. AI-based systems consistently increased adenoma detection rate (ADR) and adenomas per colonoscopy (APC) through computer-aided detection (CADe) and achieved clinically actionable accuracy for optical diagnosis in selected settings using computer-aided diagnosis (CADx). Overall, advanced imaging modalities improve lesion detection and diagnostic precision compared with WLE, each offering distinct advantages. LCI and BLI are most effective for early gastric cancer and inflammatory conditions, NBI remains a validated tool for targeted diagnosis, and AI-based systems provide the greatest gains in adenoma yield while showing promise for reliable real-time optical diagnosis. Standardization, multicenter validation, and integration into clinical practice guidelines are essential to optimize their impact on patient outcomes.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.