Background and aims. The scarce accessibility to integrated systems limits the advantage of using real-world medical data for translational research purposes. ARGO (Automatic Report Generation for Onco-hematology) converts paper-based pathology reports in electronic Case Report Forms (eCRFs) exploiting Optical Character Recognition and Natural Language Processing technologies [Zaccaria et al., Sci. Rep., 2021]. Here, we present for the first time the App version of ARGO, designed to support physicians and data entries in rapidly generating eCRFs in a standardized and filtrable fashion. For scalability purposes, we tested ARGO App by processing n. 501 pathology reports from eight Italian Centers, including Hodgkin (HL), Diffuse Large B-Cell (DLBCL), Follicular (FCL), Mantle Cell (MCL), and T-Cell (TCL) lymphoma diagnoses. Methods. Validation involved six expert hematologists who generated eCRFs by simply acquiring photographs of each paper-based report using commercially available camera-equipped smartphones (Apple® iPhones, IOS version 15). The set included n. 347 and n. 154 reports from IRCCS Istituto Tumori 'Giovanni Paolo II’ (internal series, IS) and seven Italian cooperative Centers (external series, ES), respectively. Overall, they comprised n. 139 HL, n. 154 DLBCL, n. 109 FL, n. 76 MCL, n. 6 TCL, n. 17 unclassified describing major immunohistochemistry markers (IMs) as MYC, BCL2, BCL6, CD10, CD20, Cyclin D1, CD79a, CD15, CD30, PAX5, CD5, CD3, and Ki-67 proliferation index. The automatic process of diagnosis assignment by the ARGO algorithm (developed in Python) was imposed to depend on the highest matching rate between the detected IMs and corresponding classification as from the National Institute of Health in accordance with the International Classification of Diseases, 10th (ICD-10) and oncology (ICD-O) versions. To overcome potential misdiagnosis risk, a Random Forest (RF) model was trained on the IMs set of the IS, tested on the ES, and combined with ARGO algorithm. The performance of the App was assessed for accuracy and F1-score, which is a more sensitive metric. Results. The ARGO App includes two use-cases at prospectically acquiring and retrospectively reading reports by users (physicians and/or data-managers). The first use-case allows users to acquire pathology reports via mobile phone's camera. The second use-case leads to search patients’ data filtering by "Report ID", "Name", "Surname” and "Type of diagnosis". For each new record, ARGO converts information about patients’ demography, diagnosis, tissue of origin of samples (lymph-node, extra-nodal, bone marrow, and peripheral blood), and IMs expressions. ARGO successfully converted 490 (97.8%) reports into structured eCRFs (overall, n. 18,816 data). In terms of accuracy (Fig. 1A), MYC, Cyclin D1, CD79a, CD15, EMA, BCL2 (by fluorescent in situ hybridization), IgD, IgM, EBV, Ki-67 detection achieved among 85.7% and 99.4% in both series. BCL2, BCL6, CD10, CD20, CD30, PAX5, CD23, CD5, and CD45/LCA achieved among 65.1% and 85.6% for the IS and among 71.0% and 90.9% for the ES. MUM1 and CD3 achieved 56.2% and 71.4%, and 72.0% and 79.9% for IS and ES, respectively. Concerning the F1-score (Fig. 1B), although no significant differences were observed between the two series, on average, biomarkers gave a score that was lower of 20.2% for IS and 35.3% for ES compared to accuracy. Interestingly, Ki-67 proliferation index, MYC, CD10, CD20, Cyclin D1, CD79a, CD15, CD23, and CD5 achieved among 73.5% and 85.9% for the IS and among 72.4% and 85.5% for the ES. The capturing of diagnosis achieved 87.3% and 82.5% of accuracy, and 87.3% and 83.0% of F1-score for IS and ES, respectively. Focusing on individual diagnoses in the ES, HL, MCL, DLBCL, FCL, and TCL reached 90.0%, 88.5%, 84.9%, 76.5%, 33.3%, respectively. Of these n. 154 reports, n. 52 (34%) were detected by the sole ARGO algorithm, n. 28 (18%) by RF, n. 67 (43%) by a combination of both, while n. 7 (5%) remained unclassified. Conclusions. We validated ARGO App that robustly converts paper-based pathology reports of major lymphoma subtypes into structured eCRFs. ARGO is feasible and easily transferable into the daily practice to generate standardized patients’ clinical records for clinical and translational research purposes. Ongoing efforts are aiming at enlarging the TCL cohort of pathology reports and developing a multilanguage version for other languages than Italian. Figure 1View largeDownload PPTFigure 1View largeDownload PPT Close modal
Read full abstract