It is already 25 years since the publication of John Aitchison’s seminal paper on the statistical analysis of compositional data, which finally provided a solution to the problem identified by Karl Pearson in the nineteenth century of spurious correlation when there is a constant sum constraint. Log-ratio transformations provide a way to ‘‘escape’’ the constraint of a constant sum in a coherent way, which enables the consistent analysis of sub-compositions. It is surprising that such a fundamental advance has still not been fully embraced by scientific disciplines that deal frequently with compositional data, such as geology. Given this lack of adoption of such a fundamental advance, a book like this one is especially invaluable as it provides an excellent balance between the theoretical advances of the 20 years since John Aitchison’s book and detailed analysis of practical problems that shows that log-ratio methods provide more meaningful solutions in practice. Overall, there are four theoretical chapters, eight chapters that apply the methods to real datasets and three chapters, which cover the software available for compositional data analysis. Although the book was initially based on conference contributions, the chapters are quite consistent in notation and level, suggesting hard work by the editors! The introductory chapter by Pawlowsky-Glahn and Egozcue explains how log-ratio transformations solve the spurious correlations of Pearson and covers the advances since the Aitchison book, including isometric-log-ratio (ilr) transformations, which provide an alternative to the additive-log-ratio (alr) and centred-log-ratio (clr) transformations. However, arguably, the major benefits of the ilr transformations are mathematical (an orthogonal basis) and geometric (neat link with the implied geometry on the simplex) rather than statistical as the results transformed back to the simplex are not affected by the choice of transformation and the ilr transformations can be harder to interpret. Centering the data at the geometric mean, as a special case of the perturbation operation, is shown to help in visualization, together with biplots, which provide a useful graphical summary of the variation in the simplex. The second theoretical chapter by Egozcue and Pawlowsky-Glahn provides a detailed description of the geometry on the simplex including the distance metric, which is simple for the clr and ilr representation/transformations. While this chapter is interesting and important from a mathematical and geometrical perspective it is not clear that it provides additional statistical or geological insight. The third theoretical chapter by Martin-Fernandez and Thio-Henestrosa discusses possible solutions to the problem of zero components, treating the zeros as though the true value is missing due to detection limits rather than a complete absence of the component. They examine the possibility of using non-parametric imputation following the methods of Rubin and Little for missing data and conclude that multiplicative adjustments are superior to additive or simple adjustments, although they emphasize the necessity of sensitivity analysis to examine the robustness in the context of the analysis selected. They also claim that artificial correlation is caused when there is more than one component with zeros and non-parametric methods of imputation are used, but it seems that this outcome reflects the use of a specific non-parametric method, rather than a weakness of non-parametric methods J. Bacon-Shone (&) Social Sciences Research Centre, The University of Hong Kong, Pokfulam Road, Hong Kong e-mail: johnbs@hku.hk
Read full abstract