Related Article, see p 1474KEY POINT: “…the object of statistical methods is the reduction of data. A quantity of data …is to be replaced by relative few quantities which shall adequately represent the whole.” —Sir Ronald Fisher, 1922In this issue of Anesthesia & Analgesia, Meersch et al1 report results of an observational study on the role of vascular adhesion protein (VAP)-1 in the development of acute kidney injury (AKI) after cardiac surgery. The authors are commended for applying a range of descriptive statistics to characterize the patients and appropriately using and clearly reporting these statistics (Figure).Figure.: Excerpt of Table 1 from Meersch et al.1 Nominal (sex) and ordinal (ASA physical status score) data are reported as counts (absolute frequencies) and percentages (relative frequencies). Note that the numerators (counts per category per group: inside red squares) and denominators (total number of patients per group: inside blue squares) are appropriately presented. Quantitative data are reported as the mean (SD)—which is generally preferred to mean ± SD—or as the median with the Q1, Q3 values. ASA indicates American Society of Anesthesiologists; SD, standard deviation.The choice of the most appropriate summary statistic depends on the type and distribution of data. The following data types are distinguished:2 Nominal data have 2 (eg, sex) or more (eg, blood type) nonordered categories. Ordinal data have categories with a logical order (eg, American Society of Anesthesiologists [ASA] physical status score). Interval data are numeric and have an arbitrary zero point (eg, temperature in Celsius), while ratio data have an absolute zero point (eg, weight). Nominal and ordinal data are correctly summarized with the counts (absolute frequencies) and proportions or percentages (relative frequencies) of observations within the categories. When reporting relative frequencies, the numerator and denominator should both be presented. Interval and ratio data can also be summarized by frequencies within specific strata of the quantitative data (eg, number of patients with age 0–10 years, 11–20 years, etc). These frequencies are commonly reported as a histogram, providing a useful visual display of the data distribution. Two key characteristics of an interval or ratio dataset are (a) its central tendency characteristic—the “average” location of the data and (b) its variability—the extent to which individual values vary around the center. Both should be reported. For approximately normally distributed data, the arithmetic mean appropriately describes the center of the data.3 It is the sum of all observed values, divided by the number of observations. The mean is commonly accompanied by the standard deviation (SD), which describes how far the data points are dispersed around the mean. For normally distributed data, 68% are within 1 SD, 95% are within 2 SDs, and >99% are within 3 SDs on both sides of the mean.3 The median is the value that divides a dataset into 2 equal-sized parts, with the same number of datapoints above and below it.3 It is relatively robust to asymmetric (skewed) distributions, and extreme values (outliers). Therefore, the median is preferred to the mean for non-normally distributed data, and it can also be used for ordinal data.3 The median is commonly reported with the first and third quartiles (Q1, Q3), often loosely referred to as interquartile range. Quartiles divide the dataset into4 equally sized parts, and the interval between Q1 and Q3 contains the middle 50% of the data points. Descriptive statistics are also sometimes used as point estimates of population parameters (eg, using a sample to estimate a proportion or a mean in the population). In this case, they should be accompanied by a confidence interval as a measure of the estimate precision.4
Read full abstract