Murder on the Barcode Express …

A long time ago, Agatha Christie imagined a train coming to a halt in a snowdrift somewhere in Croatia.  By the morning, one of the passengers was dead.   Eighty years later, a group, only slightly larger than Hercule Poirot’s pool of suspects, gathered in a room in modern Zagreb to plot another fiendish murder.   The victim, this time, would be  …. traditional diatom taxonomy.

“Murder” is far too strong a term for this particular whodunit; maybe I should say “aiding and abetting” rather than actually committing the crime, but I think the outcome might be the same.  The conspirators in Zagreb are all involved in developing methods that use molecular barcoding to identify diatoms and have been busily collecting sequences of the many diatom species in order to establish the libraries that we need to link these barcodes to the appropriate Linnaean binomial.   Some years into this, we still have no more than about 15% of freshwater diatom species matched to barcodes.  We are starting to think about ways of filling in the gaps more quickly than is possible using the conventional approach of isolating a diatom, growing it in culture and then sequencing the appropriate marker genes.

The most radical of these alternatives is to by-pass Linnaean binomials altogether and classify diatoms by their barcodes alone – as “operational taxonomic units” or OTUs.   Most of us have spent most of our careers using morphology-based taxonomy and any move away seems like an act of treachery towards a fundamental tenet of our craft.  But the time has come to take a dispassionate view and ask what a species name brings to ecology.   At a very practical level, the use of Linnaean binomials makes it much easier for us to compare data with colleagues and with records in the literature.    Taxonomists would argue that their work helps us to understand the relationships between species but, unfortunately, in this particular branch of science, we make little use of these relationships, and the role of taxonomy is primarily to give us a consistent means of organising the myriad tiny pieces of silica which we find in our samples.

That business of consistent naming could, in theory, be performed for barcodes just as efficiently using digital tags as OTUs and this would also work for the 85% of species where the link between traditional morphology-based taxonomy and marker genes has not yet been established.   So what about the link that Linnaean binomials give us to established knowledge?   Here, again, we need to be brutally frank: ecological information for most freshwater diatoms is limited to information about preferences for hardness/alkalinity, inorganic nutrients, organic pollution, acidity and salinity and that information can be replicated very easily by linking files of metabarcoding and environmental data.  There are very few experimental studies that offer insights into the ecology of freshwater benthic diatoms beyond that gained from looking for associations between diatom distribution and a few common variables.

The plotters plotting …  DNAqua-net workshop in Zagreb, November 2017.  The top photograph shows Zagreb cathedral against the skyline.

The problem is not that we do not see the merits of traditional Linnaean taxonomy, it is that we cannot make a strong case for the funding necessary to collect barcodes for all species.   The final downward thrust of the dagger will, in other words, be inflicted by the bureaucrats whose budgets will not stretch to cataloguing the enormous breadth of algal diversity.   Diatoms sit in the awkward middle ground between larger organisms such as fish where any suggestion of not using traditional taxonomy would be greeted with derision and the microbial world where the idea of applying Linnaean binomials to the enormous diversity uncovered by molecular techniques is equally risible.   Diatom names mean little to the bureaucrats who manage our environmental agencies and, given the choice between a spreadsheet of incomprehensible Latin names or one of equally incomprehensible OTUs, all else being equal, they will choose the cheapest.

“All else being equal” is the key phrase.   I think that there is growing awareness now that one downside of barcoding is that it risks sidestepping the need for trained biologists at all: samples will be collected by technicians, processed in high-throughput laboratories and results churned out through black box computer programs.   The situation for diatoms is worse than for most groups of organisms used for ecological assessment because so much attention is given to the laboratory stages of producing a list of taxa and relative abundances.  We are, however, now approaching the point when DNA sequencers can produce data of equivalent sensitivity to that produced by light microscopy.   The message that barcoding has the potential to be a good friend but a poor master could be lost as our paymasters recognise the potential for reducing costs.   What we need to do now is use those “little grey cells” to ensure that good biological insight is not the victim of a heinous crime.


Winning hearts and minds …

I write several of my posts whilst travelling, though am always conscious of the hypocrisy of writing an environmentally-themed blog whilst, at the same time, chalking up an embarrassing carbon footprint.  Last month, however, I participated in my first “eConference”, in which the participants were linked by the internet.  With over 200 people from all over Europe, and beyond, attending for all or part of the three days, there was a substantial environmental benefit and whilst there was little potential for the often-useful “off-piste” conversations that are often as useful as the formal programme of a conference, there were some unexpected benefits.  I, for example, managed to get the ironing done whilst listening to Daniel Hering and Annette Battrup-Pedersen’s talks.

You can find the presentations by following this link:   My talk is the first and, in it, I tried to lay out some of the strengths and weaknesses of the ways that we collect and use ecological data for managing lakes and rivers.  I was aiming to give a high level overview of the situation and, as I prepared, I found myself drawing, as I often seem to do, on medical and health-related metaphors.

At its simplest, ecological assessment involves looking at a habitat, collecting information about the types of communities that are present and match the information we collect to knowledge that we have obtained from outside sources (such as books and teachers) and from prior experience in order to guide decisions about future management of that habitat. At its simplest, this may involve categoric distinctions (“this section of a river is okay, but that one is not”) but we often find that finer distinctions are necessary, much as when a doctor asks a patient to articulate pain on a scale of one to ten.  The doctor-patient analogy is important, because the outcomes from ecological assessment almost always need to be communicated to people with far less technical understanding than the person who collected the information in the first place.

I’ve had more opportunity than I would have liked to ruminate on these analogies in recent years as my youngest son was diagnosed with Type I diabetes in 2014 (see “Why are ecologists so obsessed with monitoring?”).   One of the most impressive lessons for me was how the medical team at our local hospital managed to both stabilise his condition and teach him the rudiments of managing his blood sugar levels in less than a week.   He was a teenager with limited interest in science so the complexities of measuring and interpreting blood sugar levels had to be communicated in a very practical manner.  That he now lives a pretty normal life stands testament to their communication, as much to their medical, skills.

The situation with diabetes offers a useful parallel to environmental assessment: blood sugar concentrations are monitored and evaluated against thresholds.  If the concentration crosses these thresholds (too high or too low), then action is taken to either reduce or increase blood sugar (inject insulin or eat some sugar or carbohydrates, respectively).   Blood sugar concentrations change gradually over time and are measured on a continuous scale.  However, for practical purposes they can be reduced to a simple “Goldilocks” formula (“too much”, “just right”, “not enough”).  Behind each category lie, for a diabetic, powerful associations that reinforce the consequences of not taking action (if you have even seen a diabetic suffering a “hypo”, you’ll know what I mean).

Categorical distinctions versus continuous scales embody the tensions at the heart of contemporary ecological assessment: a decision to act or not act is categorical yet change in nature tends to be more gradual.   The science behind ecological assessment tends to favour continuous scales, whilst regulation needs thresholds.  This is, indeed, captured in the Water Framework Directive (WFD): there are 38 references to “ecological status”, eight in the main text and the remainder in the annexes.  By contrast, there are just two references to “ecological quality ratios” – the continuous scale on which ecological assessment is based – both of which are in an annex.   Yet, somehow, these EQRs dominate conversation at most scientific meetings where the WFD is on the agenda.

You might think that this is an issue of semantics.  For both diabetes and ecological assessment, we can simply divide a continuous measurement scale into categories so what is the problem?   For diabetes, I think that the associations between low blood sugar and unpleasant, even dangerous consequences are such that it is not a problem.  For ecological assessment, I’m not so sure.  Like diabetes, our methods are able to convey the message that changes are taking place.  Unlike diabetes, they are often failing to finish the sentence with “… and bad things will happen unless you do something”.   EQRs can facilitate geek-to-geek interactions but often fail to transmit the associations to non-technical audiences – managers and stakeholders – that make them sit up and take notice.

I’d like to think that we can build categorical “triggers” into methods that make more direct links with these “bad things”.  In part, this would address the intrinsic uncertainty in our continuous scales (see “Certainly uncertain …”) but it would also greatly increase the ability of these methods to communicate risks and consequences to non-technical audiences (“look – this river is full of sewage fungus / filamentous algae – we must do something!”).   That’s important because, whilst I think that the WFD is successful at setting out principles for sustainable management of water, it fails if considered only as a means for top-down regulation.   In fact, I suspect that Article 14, which deals with public participation, is partly responsible for regulators not taking action (because “costs” are perceived as disproportionate to “benefits”) than for driving through improvements.   We need to start thinking more about ensuring that ecologists are given the tools to communicate their concerns beyond a narrow circle of fellow specialists (see also “The democratisation of stream ecology?”).   Despite all the research that the WFD has spawned, there has been a conspicuous failure to change “hearts and minds”.  In the final analysis, that is going to trump ecological nuance in determining the scale of environmental improvement we should expect.

It’s all about the algae

Just a short post to point you all towards an article I wrote for Royal Society of Biology’s magazine The Biologist.  It is a broad overview of the reasons why we use algae to assess the condition of our lakes and rivers in Europe and is illustrated with three of Chris Carter’s beautiful images, and the print edition will have even more of these.  Take the figure legends with a pinch of salt (we didn’t write these!): neither Tolypella nor Chaetophora are particularly common in the UK.   Navicula, on the other hand, is common but the legend makes no mention of this.

Whilst I have your attention, I will also point you towards a short article that I wrote for the most recent Phycological Bulletin, the newsletter of the Phycological Society of America.  This offers a few more hints to anyone thinking about entering the Hilda Canter-Lund competition next year.

Certainly uncertain …

Back in May I set out some thoughts on what the diatom-based metrics that we use for ecological assessment are actually telling us (see “What does it all mean?”).  I suggested that diatoms (and, for that matter, other freshwater benthic algae) showed four basic responses to nutrients and that the apparent continua of optima obtained from statistical models was the result of interactions with other variables such as alkalinity.   However, this is still only a partial explanation for what we see in samples, which often contain species with a range of different responses to the nutrient gradient.  At a purely computational level, this is not a major problem, as assessments are based on the average response of the assemblage. This assumes that the variation is stochastic, with no biological significance.  In practice, standard methods for sampling phytobenthos destroy the structure and patchiness of the community at the location, and our understanding is further confounded by the microscopic scale of the habitats we are trying to interpret (see “Baffled by the benthos (1)”).  But what if the variability that we observe in our samples is actually telling us something about the structure and function of the ecosystem?

One limitation of the transfer functions that I talked about in that earlier post is that they amalgamate information about individual species but do not use any higher level information about community structure.  Understanding more about community structure may help us to understand some of the variation that we see.   In the graph below I have tried to visualise the response of the four categories of response along the nutrient/organic gradient in a way that tries to explain the overlap in occurrence of different types of response.   I have put a vertical line on this graph in order that we can focus on the community at one point along the pollution gradient, noting, in particular, that three different strategies can co-exist at the same level of pollution.  Received wisdom amongst the diatom faithful is that the apparent variation we see in ecological preferences amongst the species in a single sample reflects inadequacies in our taxonomic understanding.  My suggestion is that this is partly because we have not appreciated how species are arranged within a biofilm.  I’ve tried to illustrate this with a diagram of a biofilm that might lead to this type of assemblage.

Schematic diagram showing the response of benthic algae along a nutrient/organic gradient.  a.: taxa thriving in low nutrient / high oxygen habitats; b.: taxa thriving in high nutrient / high oxygen habitats; c.: taxa thriving in high nutrient / low oxygen habitats; d.: taxa thriving in high nutrients / very low oxygen habitats.   H, G., M, P and B refer to high, good, moderate, poor and bad ecological status.

The dominant alga in many of the enriched rivers in my part of the world is the tough, branched filamentous green alga Cladophora glomerata.   This, in turn, creates micro-habitats for a range of algae.  Some algae, such as Rhoicosphenia abbreviata, Cocconeis pediculus and Chamaesiphon incrustans, thrive as epiphytes on Cladophora whilst others, such as C. euglypta are often, but not exclusively, found in this microhabitat.  Living on Cladophora filaments gives them better access to light but also means that their supply of oxygen is constantly replenished by the water (few rivers in the UK are, these days, so bereft of oxygen to make this an issue).   All of these species fit neatly into category b. in my earlier post.

Underneath the Cladophora filaments, however, there is a very different environment.  The filaments trap organic and inorganic particulate matter which are energy sources for a variety of protozoans, bacteria and fungi.   These use up the limited oxygen in the water, possibly faster than it can be replenished, so any algae that live in this part of the biofilm need to be able to cope with the shading from the Cladophora plus the low levels of oxygen.   Many of the species that we find in highly polluted conditions are motile (e.g. Nitzschia palea), and so are able to constantly adjust their positions, in order to access more light and other resources.   They will also need to be able to cope with lower oxygen concentrations and, possibly, with consequences such as highly reducing conditions.  These species will fit into categories c. and d. in the first diagram.

A stylised (and simplified) cross-section through a biofilm in a polluted river, showing how different algae may co-exist.   The biofilm is dominated by Cladophora glomerata (i.) with epiphytic Rhoicosphenia abbreviata (ii.), Cocconeis euglypta (iii.) and Chamaesiphon incrustans (iv.) whilst, lower down in the biofilm, we see motile Nitzschia palea (v.) and Fistulifera and Mayamaea species (vi.) growing in mucilaginous masses.

However, as the cross-section above represents substantially less than a millimetre of a real biofilm, it is almost impossible to keep apart when sampling, and we end up trying to make sense of a mess of different species.   The ecologists default position is, inevitably, name and count, then feed the outputs into a statistical program and hope for the best.

A final complication is that river beds are rarely uniform.  The stones that make up the substrate vary in size and stability, so some are rolled by the current more frequently than others.  There may be patches of faster and slower flow associated with the inside and outsides of meanders, plus areas with more or less shade.   As a result, the patches of Cladophora will vary in thickness (some less stable stones will lack them altogether) and, along with this, the proportions of species exhibiting each of the strategies.  The final twist, therefore, is that the vertical line that I drew on the first illustration to illustrate a point on a gradient is, itself, simplistic.  As the proportions vary, so the position of that line will also shift.  Any one sample (itself the amalgamation of at least five microhabitats) could appear at a number of different points on the gradient.  Broadly speaking, uncertainty is embedded into the assessment of ecological status using phytobenthos as deeply as it is in quantum mechanics.  We can manage uncertainty to some extent by taking care with those aspects that are within our control.   However, in the final analysis, a sampling procedure that involves an organism 25,000 times larger than most diatoms blundering around a stream wielding a toothbrush is invariably going to have limitations.

The same schematic diagram as that at the start of this article, but with the vertical line indicating the position of a hypothetical sample replaced by a rectangle representing the range of possibilities for samples at any one site.