Abstract Pathologic evaluation of bone marrow is diagnostically essential for many hematologic diseases, with quantification of cellular composition an important step in the pathologic work-up. Many diagnostic criteria for hematologic diseases contain precise cellular thresholds, for example 10% and 60% plasma cells in plasma cell neoplasms (PCN). Therefore, accurate cellular quantification is necessary, as it impacts treatment decisions. This process is typically performed manually and is subject to errors, including interoperator variability and heterogeneity of cellular composition within the aspirate smears. Rumke statistics were developed to predict errors in manual quantification of hematologic specimens, with resulting statistical confidence intervals. We set out to determine how statistical uncertainty of plasma cell percentages from manual differential counts could potentially affect the diagnosis and classification of PCN. Here, we extracted from the EMR a large data set containing pathologic diagnoses and laboratory data for 16,737 cases of PCN. Using the R programming language, we developed an automated method to extract plasma cell percentages from unstructured text containing manual bone marrow differential counts. Differential cell counts with plasma cell percentages were available for 4,523 cases, of which 2,445 were initial cases in the data set. Rumke statistics were calculated for each initial case using the Clopper-Pearson 95% confidence interval. Next, we identified cases where the plasma cell Rumke intervals encompassed the 10% or 60% threshold, thereby potentially changing diagnostic categorization. Of the initial cases, 228 (9.3%) encompassed a threshold (204 at 10% and 24 at 60%). To determine if diagnostic categorization would be affected, we screened for databased myeloma defining events (MDE) - hemoglobin (<10 g/dL), serum creatinine (>2 mg/dL), and total calcium (>11 mg/dL). At the 10% plasma cell threshold, 50 cases had one or more laboratory MDE and 127 had none. In cases with MDE, crossing the 10% threshold meets criteria for plasma cell myeloma (PCM) and below would likely lead to further work-up, including diagnostic imaging. In cases with no MDE, crossing the 10% threshold changes the diagnostic categorization from monoclonal gammopathy of undetermined significance to smoldering myeloma. In our data set, cases that straddled the 60% threshold all met criteria for PCM, whether they crossed the 60% threshold or not. These findings show that differential cell counts close to diagnostic thresholds fall within error ranges in a significant number of cases of PCN and potentially change diagnostic categorization. Such uncertainty must be appreciated and taken into consideration when categorizing a patient’s disease. This work also highlights the need to develop more accurate and representative methods to generate differential cell counts, such as automated methods from whole slide digital images. In addition, our methods highlight the utility of using R software to extract meaningful data from unstructured free text at large scale.
Read full abstract