Abstract

Studies involving animal personalities and behavioural syndromes are being carried out in large numbers and submitted to behavioural journals like Ethology. It therefore seems useful to consider the most important criteria for acceptable publications in this area of research. Much of this information is already available in various well-cited methodological papers (see below), but as reviewers and editors of many of these submissions we felt that it might be useful to provide a brief summary of some key issues. Our aim here is to assist authors in avoiding common problems and help them to produce strong publishable papers in personality research. These criteria are also intended to serve as up-to-date guidelines for reviewers and editors of Ethology and other similar peer-reviewed journals. There is consensus that “animal personality” refers to the repeatable part of an individual's behaviour (Dingemanse, Kazem, Réale, & Wright, 2010; Réale, Reader, Sol, McDougall, & Dingemanse, 2007), with “variation in personality” referring to among-individual differences in average behaviour. Personality studies should therefore involve repeated measures designs. This enables statistical assessment of an individual's average behaviour (Dingemanse & Dochtermann, 2013), which may then in turn be appropriately linked to the scientific concept of interest. Studies that do not feature repeated measures of the behaviour(s) concerned are not formally studies of animal personality, and should perhaps not be presented as such (Niemelä & Dingemanse, 2018a). Such studies may still usefully link their work to animal personality research provided that the authors highlight that their conclusions are conditional upon the assumption that the single measurement of behaviour per individual is only a proxy of the true behavioural mean (or “personality”) of each individual (Niemelä & Dingemanse, 2018b). Unfortunately, this assumption is unlikely to hold in most instances, because behavioural repeatability is on average only about 0.4 (Bell, Hankison, & Laskowski, 2009; Holtmann, Lagisz, & Nakagawa, 2017), implying that a behaviour measured only once largely reflects the within (60%) rather than among (40%) individual component (Dingemanse, Dochtermann, & Nakagawa, 2012). For this reason, the argument “we have previously shown that this behaviour is individually repeatable and thus we can use a single measure per individual as a proxy for its personality” is neither reasonable nor logical, because the single measure per individual will largely reflect the “unrepeatable” portion of individual phenotypes. A suitable degree of caution is therefore required in all such cases. We illustrate this point with a thought experiment, focusing on circulating levels of corticosterone and its apparent effects on fitness in a population of birds (because hormonal traits have particularly low repeatability (R = 0.1–0.2); Holtmann et al., 2017). We might detect a strong negative relationship between fitness and individual levels of corticosterone when measured only once, and from that we could conclude the trait is under strong directional selection for reduced levels of corticosterone. This would obviously be a false conclusion if the repeatable part of corticosterone did not actually predict fitness. Instead, the real explanation could well be that harsh environmental conditions reduced reproductive success whilst simultaneously also temporarily elevating individual levels of corticosterone (this problem is well-known in evolutionary biology where it is standard practice to control for such environmental confounds when measuring selection; Rausher, 1992; Stinchcombe et al., 2002). The potential for these types of biasing effects of environmental confounds is thus particularly high for traits with low repeatability, which includes many of the behaviours in studies exploring these questions with regard to animal personality. An important associated issue here is that many behaviours exhibit plasticity in response to environmental factors that change relatively slowly (compared to the lifetime of the species), creating strong temporal (Allegue et al., 2017; Mitchell, Dujon, Beckmann, & Biro, 2019) or spatial (Niemelä & Dingemanse, 2017) autocorrelations. Individuals may therefore be repeatable only because of being assessed in the same repeatable environmental conditions to which they are responding plastically (Dingemanse, Kazem, et al., 2010; Martin & Réale, 2008) and not because of any genuine among-individual variation caused by genetics or developmental effects (Dochtermann, Schwab, Berdal, Dalos, & Royaute, 2019; Dochtermann, Schwab, & Sih, 2015; Stamps & Frankenhuis, 2016). This phenomenon is called “pseudo-repeatability” or “pseudo-personality” (Dingemanse & Dochtermann, 2013; Westneat, Hatch, Wetzel, & Ensminger, 2011) and has been firmly documented by meta-analyses showing that repeatability values decrease with increasing length of inter-test intervals (Bell et al., 2009). Research primarily centring on the ecological or evolutionary causes or consequences of “genuine” repeatable differences (as defined above) should therefore avoid pseudo-personality. This can be achieved by using study designs allowing statistical control for biasing effects of environmental spatiotemporal autocorrelations (Allegue et al., 2017; Araya-Ajoy, Mathot, & Dingemanse, 2015; Mitchell et al., 2019). Generally, a demonstration of animal personality is therefore most convincing when the inter-test interval is relatively long compared to the lifespan of the species (Allegue et al., 2017). For instance, it may not be reasonable to collect repeated measures a few days apart and then draw conclusions regarding how animal personality affects aspects of fitness or life history across weeks or months. Personality studies must therefore acknowledge that animals are both plastic and repeatable, and aim to separate these two aspects of behaviour (Dingemanse, Kazem, et al., 2010). Some studies define “animal personality” as pertaining to a particular class of behaviours regardless of repeated measures, often (inappropriately) citing Réale et al. (2007) in support. For example, behaviours related to coping with captive or novel conditions, risk-taking, boldness or aggression are often considered “personality traits.” However, such approaches do not easily integrate into the sub-field that has emerged within behavioural ecology that involves the development of adaptive explanations for repeatable individual variation (“personality”) in any behaviour (Dall, Houston, & McNamara, 2004). Furthermore, some studies refer to “personality traits” and define them as “behavioural traits that are repeatable.” This labelling is arguably not helpful because all evolved behaviours are repeatable provided we collect enough data (Falconer & Mackay, 1996), and thus it effectively replaces one label (“behaviour”) with another (“personality”). Defining variation in “animal personality” as among-individual variation in average behaviour across repeated observations (as detailed above) provides a much more useful terminology that does not suffer from such issues. Many of the points above also apply to behavioural syndromes (for a recent discussion, see Niemelä & Dingemanse, 2018a). In short, a “behavioural syndrome” is the among-individual correlation in repeated measures data across multiple behaviours (Dingemanse, Kazem, et al., 2010). It essentially represents the correlation between individual-mean values across suites of traits. As above, this requires repeated measures data for each behaviour (Dingemanse & Dochtermann, 2013). It also requires that the different behaviours are measured across separate assays; otherwise, the correlations may merely reflect different mutually inclusive or exclusive types of behaviour occurring during the same observation. Following our argument above, a correlation between two behaviours each measured only once probably reflects mostly the within-individual rather than the among-individual pattern of covariance (Brommer, 2013; Dingemanse et al., 2012). This is not a problem provided that among- and within-individual correlations do not differ (Brommer & Class, 2017; Dingemanse & Dochtermann, 2013). Unfortunately, meta-analyses often show that this assumption does not hold (Dochtermann, 2011; Niemelä & Dingemanse, 2018a). Again, studies involving single measurements per behaviour per individual may be usefully presented in the context of “behavioural syndromes,” provided that the caveats (assumptions) involved are clearly laid out and any conclusions in this regard are suitably toned down. Given the arguments above, it is clear that studies on animal personality and behavioural syndromes should normally report the among- and within-individual variance underpinning the behavioural repeatability reported in the paper (Dingemanse & Dochtermann, 2013) and coefficients of variation for each hierarchical level (Dochtermann & Royaute, 2019). The best approach currently available is to run a univariate mixed-effects model, where the explicit recommendation is to facilitate understanding of these results and any future meta-analyses by presenting all fixed and random effect parameter estimates along with their uncertainties (e.g., 95% credible intervals), information on the amount of variance that each component explains, and estimates of (adjusted) repeatability (Nakagawa & Schielzeth, 2010, 2013). Some studies instead choose non-parametric statistics, for example because distributional assumptions (e.g., of normality) were not met. It is important to understand that this precludes estimating key parametric parameters (e.g., repeatability, heritability and Pearson's correlations) that are firmly embedded in evolutionary theory (Dochtermann & Roff, 2010) and thereby diminishes the value of the study (Dingemanse et al., 2012). A suitable compromise may then be to run parametric analyses (e.g., mixed-effects models) and use bootstrapping or randomization procedures to derive appropriate null distributions for statistical testing. Fortunately, recent research suggests that mixed-effects models are, in fact, extremely robust to violations of distributional assumptions (Schielzeth et al., 2020). Another common approach is to record multiple measures of behaviour within the same observation period or test assay, which are then aggregated using principal component analysis (PCA) into a single component or latent variable to represent an overall behavioural score. We have often used this approach ourselves in the past as a means of summarizing data. For example, one could estimate the heritability of a PCA component (e.g., Dingemanse, Barber, & Dochtermann, 2020) and link it to further aspects under study. This approach, however, becomes problematic when PCA components are estimated from data sets containing repeated measures since this represents a major violation (Budaev, 2010) and comes with (unreasonable) further assumptions (e.g., that correlation structures do not differ among versus within individuals, see above). Extreme caution should therefore be taken when applying such data-reduction approaches. A more appropriate approach would arguably be to report the correlation matrix between the different behavioural measures recorded within the same assay (within and among individuals) and then pragmatically take forward one representative behavioural measure that is normally distributed and covaries (i.e., loads) most strongly with the overall PCA component or latent variable estimated (Araya-Ajoy & Dingemanse, 2014; for other alternative approaches, see Dochtermann & Nelson, 2014). Either way, following the arguments above, a PCA cannot be used to imply that there is a “behavioural syndrome” or “animal personality structure” if the study does not feature repeated measures and separate behavioural assays. To our current knowledge, behavioural syndromes are best estimated by fitting multivariate mixed-effects (GLMM) models to repeated measures data across multiple traits (Dingemanse & Dochtermann, 2013). In such cases, it is important to report the variances, covariances and correlations at all levels (within and among individuals) either in the main text or supplementary materials. This enables meta-analyses of the entire matrix structure (e.g., Dochtermann & Dingemanse, 2013; Royauté, Hedrick, & Dochtermann, 2020) and comparisons across biological levels (Berdal & Dochtermann, 2019; Niemelä & Dingemanse, 2018a). As above, it is the among-individual correlation that best represents the behavioural syndrome in these instances, because it also takes into account additional (otherwise biasing) sources of variance and covariance within such data (e.g., Careau & Wilson, 2017; Downs & Dochtermann, 2014). Alternative statistical measures are sometimes used to estimate behavioural syndrome structures, which may be problematic as they can lead to biased estimates (Downs & Dochtermann, 2014). Unfortunately, multivariate mixed-effects models are extremely data hungry, implying that they can only be applied to large data sets. This requirement may inherently lead to publication bias towards species where such data can readily be collected. Such issues are worrying, but may partly be overcome by trading-off investments in other research conducted within the same laboratory, collaborative team work across laboratories or other smart solutions, and many of these choices are readily available to most principal investigators in this field (Niemelä & Dingemanse, 2018b). An important problem in behavioural syndrome research is how to deal with large individual-level correlation or variance–covariance matrices, whether they are derived from multivariate (GLMM) models or not. Unfortunately, using PCA to summarize the major axes of variation in these correlation matrices does not allow assessments of how well such multivariate data fit specific alternative biological hypotheses. A more suitable statistical tool in this case is structural equation modelling (SEM) (Grace, 2006) or similar multivariate methods such as factor analysis or path analysis (Martin et al., 2019), because particular a priori hypotheses for specific covariance structures (e.g., based on the available literature) can then be tested and formally compared to find the most likely fit (Dingemanse, Dochtermann, & Wright, 2010; Dochtermann & Jenkins, 2007). One further major issue is how measures of animal personality, in terms of differences in the mean behaviour per individual, are statistically linked with other physiological, ecological and evolutionary processes of interest. One common approach has been to first calculate the mean behaviour per individual and then statistically compare it in a separate model to some other biological variable, ecological measure or fitness, etc. This often involves doing statistics on what are essentially statistical estimates (i.e., “stats-on-stats”), and thereby ignores the error variance around both sets of measures. In order to avoid biased or misleading results, whenever possible the required estimates should be derived from a single (multivariate) mixed model (Hadfield, Wilson, Garant, Sheldon, & Kruuk, 2010; Houslay & Wilson, 2017). Unfortunately, these types of complex statistical models cannot always be applied to every data set, and this simply needs to be openly acknowledged in the paper. Suitable alternatives may be available, involving estimating the posterior distribution of any parameter and then taking this entire distribution (rather than just the point estimates) forward into subsequent analyses. For example, an increasingly popular approach is to determine the relative fit of alternative SEMs based on the entire posterior distribution (rather than the point estimates) of among individual (Araya-Ajoy & Dingemanse, 2014) or genetic correlation matrices (e.g., Dingemanse, Barber, et al., 2020; Dochtermann & Dingemanse, 2013). Such solutions are welcome, but can introduce their own biases and are therefore best used in combination with simulations that demonstrate their utility for the particular data set at hand (e.g., Araya-Ajoy & Dingemanse, 2017; Dingemanse, Moiron, Araya-Ajoy, Mouchet, & Abbey-Lee, 2020; Dochtermann & Dingemanse, 2013). Again, simpler pragmatic solutions are acceptable, but authors should always be clear about their different options, how any issues related to stats-on-stats were dealt with, and how those decisions might affect their biological conclusions. As with all issues of study design and statistical analyses, it is important to note that research methods are constantly evolving (examples in this subject area include: Araya-Ajoy et al., 2015; Briffa, Rundle, & Fryer, 2008; Cleasby, Nakagawa, & Schielzeth, 2015; Dochtermann & Royaute, 2019; Mitchell et al., 2019). This summary is therefore the best current advice that we are aware of given our own reading and expertise. We ourselves have made many, if not all, of the aforementioned “mistakes” in our past publications, but we have always tried not to repeat them once a superior approach has been recommended. The point here is that researchers, and journal reviewers and editors, should stay up-to-date with currently acceptable standards within these types of fast-moving areas of research. However, we should all also review new methods critically (e.g., Class, Dingemanse, Araya-Ajoy, & Brommer, 2017; Cleasby et al., 2015; Dingemanse et al., 2012), and it is important that reviewers and editors alike are appropriately informed about study design and advanced statistical methods to be able to properly judge the merits of statistical approaches proposed or used in papers sent out for review (Bolker et al., 2009). Fortunately, this laudable aim is facilitated by accessible educational software packages, such as the R-package SQuID ("Statistical Quantification of Individual Differences"; Allegue et al., 2017), and by various “How-to” recipe papers (Dingemanse & Dochtermann, 2013; Wilson et al., 2010). However, an important message here is that we must guard against dogmatism and avoid paper rejection just because authors fail to use a popular method when they might have quite reasonable and defendable reasons for not doing so. We would like to end by noting that although we have focussed primarily on issues concerning study design and statistical methods, our real motivation here is to improve the biological insights gained from investigating the evolutionary ecology of animal personality and behavioural syndromes. Authors submitting their work for publication in Ethology and other behavioural ecological journals should therefore take note of one further requirement: the scientific question should clearly connect animal personality or behavioural syndromes to the ecology and/or evolution of the species involved. Studies that merely describe animal personality or behavioural syndromes are important, particularly as part of newly established study systems, but without these connections we would argue that they are not suitable for journals such as Ethology. Future studies should attempt to advance this field by publishing convincing tests of assumptions and predictions of adaptive theories in animal behaviour (Dall & Griffith, 2014; Dingemanse & Wolf, 2010). This editorial should therefore not be used to fuel criticisms of the field (Beekman & Jordan, 2017), but rather be seen as encouraging the continued submission of many high-quality studies in this valuable and flourishing area of behavioural research. We thank members of the Dingemanse laboratory and other participants of their daily digital coffee break for further feedback and discussion. We also thank members of SQuID for continued discussion of this topic over the years, and Denis Réale, Raphael Royauté and Tom Tregenza for comments on a previous draft of this Editorial. N.J.D. was supported by the German Science Foundation (grant no. DI 1694/1-1) and JW by the Research Council of Norway (SFF-III 223257/F50).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call