What does it all mean?

Just over a quarter of a century ago, my friend and colleague Steve Juggins and a group of other palaeoecologists came up with a clever way to relate the composition of diatom samples taken from different levels of a sediment core to the environmental conditions of the lake at the time that these diatoms were alive.   At the heart of this was a set of statistical tools called “transfer functions” and the use of these has proliferated over subsequent years, spilling from diatoms to many other groups of organisms and from palaeoecological studies to contemporary investigations of man’s impact on the environment.   So pervasive have these methods become that Steve returned to the subject a few years ago and critiqued the many misuses of the method that he was seeing in the literature.

The principle behind the use of transfer functions is that each species has a characteristic response to an environmental pressure gradient (in early studies this was pH) which could be portrayed as a unimodal (approximately bell-shaped curve).   The point along the gradient where a species is most abundant represents the “optimum” condition, the level of the pressure where the species thrives best.  The average of the optima of all organisms in a sample, Steve and colleagues showed, could be then used to estimate the value of the pressure.   This unlocked the door to quantitative reconstructions of changes in acidification of lakes in the UK and Scandinavia that, in turn, ultimately shaped environmental policy. It was one of the most impressive achievements of applied ecologists in the 20th century.

A diagrammatic representation of the principle behind transfer functions: each organism has a characteristic response to the predominant pressure (nutrient/organic pollution in this case).

Part of the reason for their success in building strong predictive models was, I suspect, that the pollutant that they were focussed upon had a direct effect on the physiology of the cells which, in turn, created strong selective pressures on the community.   Another reason was that palaeoecological samples condense all the habitat variation within a lake (plankton v benthic, seasonal differences etc) into a single assemblage.   This, then, begs the question of how well we should expect transfer functions to perform when applied to assemblages which represent much narrower windows of space and time, and when the pollutants of interest exert indirect rather than direct effects on the organisms.   Or, to recast that question another way, are some of the problems we encounter interpreting diatom indices from rivers another form of the misuse of transfer functions that Steve dissects in his review?

It is easy to believe that transfer functions do work when applied to contemporary diatom assemblages from rivers.   If you evaluate datasets you will almost certainly find that the “optima” for all the species do appear to be arranged along a continuum along the pressure gradient.  The question that we need to ask is whether this represents a causal relationship or is just a statistical artefact?  I touched on this issue in “What we expect is often what we get …” but, in that post, I was mostly interested in how samples react along a gradient, not the response of individual species.  I suspect that, given the importance of alkalinity in freshwater algal ecology (see “Ecology in the Hard Rock Café”), this must influence the distribution of optima along a nutrient gradient.   This will be compounded when sample sizes are small, as the likelihood is that the sample optimum will not correspond exactly to the “true” optimum for the species in question (a question Steve has also addressed in a more recent paper – see reference list below).  Finally, this is all embedded within a larger problem: that most of the work I have discussed here involves statistical inference from datasets compiled from samples collected from a range of sites in a region, but is intended to address changes in time rather than space (so-called “space-for-time substitution – see reference by Pickett below).   There has been relatively little testing of species preferences under controlled experimental conditions.

In practice, I suspect, the physiological response of benthic algae to nutrients is less complicated than our noisy graphs suggest.   I set out a version of this in “What we expect is often what we get …”.   That post dealt primarily with communities of microalgae; this is the same basic scheme (with some slight revisions) but posed in terms of the physiological response of the organisms.  It borrows from the habitat matrix conceptual model of Barry Biggs, Jan Stevenson and Rex Lowe (which, itself, builds on earlier work on terrestrial plants by Phil Grime and colleagues).

An alternative explanation for the response of benthic algae to nutrients and organic pollution.  a., b., c. and d. are explained in the text.

  1. Low nutrients / high oxygen concentrations – the “natural state” in most cases. Biggs et al. referred to species adapted to such conditions “stress-adapted” as they can cope in situations where nutrients are scarce. Associated with TDI scores 1 and 2.  Examples: Hannaea arcus, Achnanthidium minutissimum, Tabellaria flocculosa.
  2. high nutrients / no “secondary effects” of eutrophication – these are “competitive” species in Biggs et al.’s template and can thrive when there is anthropogenic enrichment of nutrients. Ideally, this group would consist of species that have a physiological adaptation that allows them to thrive when nutrients are plentiful though, in practice, our understanding is based mostly on inference from spatial patterns. The “window” where such species can thrive is wide, and will overlap with the two states described below, in many cases.  Associated with TDI scores 3 and 4.  Examples: Amphora pediculus, Rhoicosphenia abbreviata, Cocconeis pediculus.  Cladophora glomerata would be a good example of a non-diatom that belongs to this group.
  3. high nutrients plus “secondary effects” of eutrophication – this category extends the habitat template of Biggs et al. to include organisms whose are reacting to secondary effects  of nutrient enrichment (e.g. shade and low oxygen) rather than to the elevated nutrients per se and is, consequently, difficult to differentiate from a direct response to organic pollution. Associated with TDI scores 4 and 5. Examples include several species of Nitzschia as well as Mayamaea and Fistulifera, amongst others.   Importantly, this group may co-exist with representatives from group b. – perhaps inhabiting different zones of the biofilm that typically blend together when a sample is taken.
  4. high nutrients / very low oxygen – a final category that represents extreme situations when an ability to cope with reducing conditions is beneficial, and where diatoms that are facultative heterotrophs may thrive. Associated with TDI score 5. Heterotrophic fungal and bacterial growths (“sewage fungus”) may also be abundant.  Once again, there is likely to be some overlap between this and other groups.   Technically, this group is more likely to be associated with serious organic pollution than with nutrients; however, it will be found at sites where nutrient concentrations are high and it is possible that an association with nutrients may be inferred from spatial patterns.

We are left, in other words, with a choice between deriving optima along a continuous scale based on inferences from spatial patterns within which we know that there are significant confounding variables or dividing species into a few physiologically-defined categories for which there is not very much experimental underpinning.   Neither is ideal, and some of our recent analyses suggest that, in terms of model strength, there is little to choose between them.   The former, in my view, suggests an artificially high level of precision that is unrealistic, given the current state of knowledge.   The latter, on the other hand, links the data to a conceptual model rather than simply relying upon the numbers that squirt out at the far end of a statistical process.

That does not mean that such an approach might not be appropriate for some other groups of organisms.  The reason why I urge simplicity for diatoms is largely because of the scale of the habitats that we are sampling, in relation to the wider patterns of variability.  A continuous series of optima may be appropriate in some cases too.   Macrophytes surveys, for example, encompass all visible organisms found along a 100 m stretch.   These will have a range of life history and nutrient acquisition strategies: some of these will take up nutrients from the water, some from the sediments.  Different types of sediment will vary in the supply of phosphorus and nitrogen, and so on.   There will still be issues of confounding variables and risks of inferring from correlative rather than causal relationships, but perhaps the overall patchiness experienced over the survey length will create a more complex web of interactions between nutrients and community that justifies a continuous scale.

For diatoms, however, simplicity is probably the best choice.   In the absence of definitive evidence one way or the other we apply Occam’s Razor (“entities should not be multiplied unnecessarily”) and opt for the simpler of the two hypotheses pending evidence to the contrary.   This, in turn, may address a deeper issue: that of finding robust answers to complex problems (see “Unravelling causal thickets …”).   Inference from statistical models is only as good as the conceptual models that underpin those models and, I fear, we too often are so lost in the detail of the many confounding variables that we lose sight of our goals.  Being able to understand our observations in terms of ecological process is the first step to finding robust solutions to our problems.

References

Bennion, H., Juggins, S. & Anderson, N.J. (1996).  Predicting epilimnetic phosphorus concentrations using an improved diatom-based transfer function and its application to lake eutrophication management. Environmental Science & Technology 30: 2004-2007.

Biggs, B.J.F., Stevenson, R.J. & Lowe, R.L. (1991). A habitat matrix conceptual model for stream periphyton. Archiv für Hydrobiologie 143: 21-56.

Birks, H.J.B.,  Line, J.M., Juggins, S., Stevenson, A.C. & ter Braak, C.J.F.  (1990). Lake surface-water chemistry reconstructions from palaeolimnological data. Diatoms and pH reconstruction. Philosophical Transactions of the Royal Society of London Series B 327: 263-278.

Juggins, S. (2013).  Quantitative reconstructions in palaeolimnology: new paradigm or sick science?  Quaternary Science Reviews 64: 20-32.

Kelly, M.G., King, L. & Ní Chatháin, B. (2009).  The conceptual basis of ecological status assessments using diatoms.  Biology and Environment: Proceedings of the Royal Irish Academy 109B: 175-189.

Pickett, S.T.A. (1988).  Space-for-time substitution as an alternative to long-term studies.  Pp. 110-135.   In: Long-term Studies in Ecology: Approaches and Alternatives (edited by G.E.. Likens).  Springer-Verlag, New York.

Reavie, E.D. & Juggins, S. (2011).  Exploration of sample size and diatom-based indicator performance in three North American phosphorus training sets.  Aquatic Ecology 45: 529-538.

Advertisements