The Imitation Game

About a year ago, I made a dire prediction about the future of diatom taxonomy in the new molecular age (see “Murder on the barcode express …“).   A year on, I thought I would return to this topic from a different angle, using the “Turing Test” in Artificial Intelligence as a metaphor.   The Turing Test (or “Imitation Game”) was derived by Alan Turing in 1950 as a test of a machine’s ability to exhibit intelligent behaviour, indistinguishable from that of a human (encapsulated as “can machines do what we [as thinking entities] can do?”).

My primary focus over the past few years has not been the role of molecular biology in taxonomy, but rather the application of taxonomic information to decision-making by catchment managers.   So my own Imitation Game is not going to ask whether computers will ever identify microscopic algae as well as humans, but rather can they give the catchment manager the information they need to make a rational judgement about the condition of a river and the steps needed to improve or maintain that condition as well as a human biologist?

One of the points that I made in the earlier post is that current approaches based on light microscopy are already highly reductionist: a human analyst makes a list of species and their relative abundances which are processed using standardised metrics to assign a site to a status class. In theory, there is the potential for the human analysts to then add value to that assignment through their interpretations.  The extent to which that happens will vary from country to country but there two big limitations: first, our knowledge of the ecology of diatoms is meagre (see earlier post) and, in any case, diatoms represent only a small part of the total diversity of microscopic algae and protists present in any river.   That latter point, in particular, is spurring some of us to start exploring the potential of molecular methods to capture this lost information but, at the same time, we expect to encounter even larger gaps in existing taxonomic knowledge than is the case for diatoms.

One very relevant question is whether this will even be perceived as a problem by the high-ups.  There is a very steep fall-off in technical understanding as one moves up through the management tiers of environmental regulators.   That’s inevitable (see “The human ecosystem of environmental management…“) but a consequence is that their version of the Imitation Game will be played to different rules to that of the Environment Agency’s Poor Bloody Infantry whose game, in turn, will not be the same as that of academic taxonomists and ecologists.  So we’ll have to consider each of these versions separately.

Let’s start with the two extreme positions: the traditional biologist’s desire to retain a firm grip on Linnaean taxonomy versus the regulator’s desire for molecular methods to imitate (if not better) the condensed nuggets of information that are the stock-in-trade of ecological assessment.   If the former’s Imitation Game consists of using molecular methods to capture the diversity of microalgae at least as well as human specialists, then we run immediately into a new conundrum: humans are, actually, not very good at doing this, and molecular taxonomy is one of the reasons we know this to be true.  Paper after paper has shown us the limitations of taxonomic concepts developed during the era of morphology-based taxonomy.  In the case of diatoms we are now in the relatively healthy position of a synergy between molecular and morphological taxonomy but the outcomes usually indicate far more diversity than we are likely to be able to catalogue using formal Linnaean taxonomy to make this a plausible option in the short to medium-term.

If we play to a set of views that is interested primarily in the end-product, and is less interested in how this is achieved, then it is possible that taxonomy-free approaches such as those advocated by Jan Pawlowski and colleagues, would be as effective as methods that use traditional taxonomy.   As no particular expertise is required to collect a phytobenthos sample, and the molecular and computing skills required are generic rather than specific to microalgae, the entire process could by-pass anyone with specialist understanding altogether.  The big advantages are that it overcomes the limitations of a dependence on libraries of barcodes of known species and, as a result, that it does not need to be limited to particular algal groups.  It also has the greatest potential to be streamlined and, so, is likely to be the cheapest way to generate usable information.   However, two big assumptions are built into this version of the Imitation Game: first, there is absolutely no added value from knowing what species are present in a sample and, second, that it is, actually, legal. The second point relates to the requirement in the Water Framework Directive to assess “taxonomic composition” so we also need to ask whether a list of “operational taxonomic units” (OTUs) meets this requirement.

In between these two extremes, we have a range of options whereby there is some attempt to align molecular barcode data with taxonomy, but stopping short of trying to catalogue every species present.  Maybe the OTUs are aggregated to division, class, order or family rather than to genus or species?   That should be enough to give some insights into the structure of the microbial world (and be enough to stay legal!) and would also bring some advantages. Several of my posts from this summer have been about the strange behavior of rivers during a heatwave and, having commented on the prominence and diversity of green algae during this period, it would be foolish to ignore a method that would pick up fluctuations between algal groups better than our present methods.   On the other hand, I’m concerned that an approach that only requires a match to a high-level taxonomic group will enable bioinformaticians and statisticians to go fishing for correlations with environmental variables without needing a strong conceptual behind their explorations.

My final version of the Imitation Game is the one played by the biologists in the laboratories around the country who are simultaneously generating the data used for national assessments and providing guidance on specific problems in their own local areas.   Molecular techniques may be able to generate the data but can it explain the consequences?  Let’s assume that method in the near future aggregates algal barcodes into broad groups – greens, blue-greens, diatoms and so on, and that some metrics derived from these offer correlations with environmental pressures as strong or stronger than those that are currently obtained.   The green algae are instructive in this regard: they encompass an enormous range of diversity from microscopic single cells such as Chlamydomonas and Ankistrodesmus through colonial forms (Pediastrum) and filaments, up to large thalli such as Ulva.   Even amongst the filamentous forms, some are signs of a healthy river whilst others can be a nuisance, smothering the stream bed with knock-on consequences for other organisms.   A biologist, surely, wants to know whether the OTUs represent single cells or filaments, and that will require discrimination of orders at least but in some cases this level of taxonomic detail will not be enough.   The net alga, Hydrodictyon(discussed in my previous post) is in the same family as Pediastrumso we will need to be able to discriminate separate genera in this case to offer the same level of insight as a traditional biologist can provide.   We’ll also need to discriminate blue-green algae (Cyanobacteria) at least to order if we want to know whether we are dealing with forms that are capable of nitrogen fixation – a key attribute for anyone offering guidance on their management.

The primary practical role of Linnaean taxonomy, for an ecologist, is to organize data about the organisms present at a site and to create links to accumulated knowledge about the taxa present.    For many species of microscopic algae, as I stressed in “Murder on the barcode express …”, that accumulated knowledge does not amount to very much; but there are exceptions.  There are 8790 records on Google Scholar for Cladophora glomerata, for example, and 2160 for Hydrodictyon reticulatum.  That’s a lot of wisdom to ignore, especially for someone who has to answer the “so what” questions that follow any preliminary assessment of the taxa present at a site.  But, equally, there is a lot that we don’t know and molecular methods might well help us to understand this.   There will be both gains and losses as we move into this new era but, somehow, blithely casting aside hard-won knowledge seems to be a retrograde step.

Let’s end on a subversive note: I started out by asking whether “machines” (as a shorthand for molecular technology) can do the same as humans but the drive for efficiency over the last decade has seen a “production line” ethos creeping into ecological assessment.   In the UK this has been particularly noticeable since about 2010, when public sector finances were squeezed.   From that point on, the “value added” elements of informed biologists interpreting data from catchments they knew intimately started to be eroded away.   I’ve described three versions of the Imitation Game and suggested three different outcomes.  The reality is that the winners and losers will depend upon who makes the rules.  It brings me back to another point that I have made before (see “Ecology’s Brave New World …”): that problems will arise not because molecular technologies are being used in ecology, but due to how they are used.   It is, in the final analysis, a question about the structure and values of the organisations involved.

References

Apothéloz-Perret-Gentil, L., Cordonier, A., Straub, F., Iseili, J., Esling, P. & Pawlowksi, J. (2017).  Taxonomy-free molecular diatom index for high-throughput eDNA monitoring.   Molecular Ecology Resources17: 1231-1242.

Turing, A. (1950).  Computing machinery and intelligence.  Mind59: 433-460.

Advertisements

The multiple dimensions of submerged biofilms …

My recent dabbling and speculation in the world of molecular biology and biochemistry (see “Concentrating on carbon …” and “As if through a glass darkly …”) reawakened deep memories of lectures on protein structure as an undergraduate and, in particular, the different levels at which we understand this.   These are:

  • Primary structure: the sequence of amino acids in the polypeptide chain;
  • Secondary structure: coils and folds along the polypeptide chain caused by hydrogen bonds between peptide groups;
  • Tertiary structure: three-dimensional organisation of protein molecules driven by hydrophobic interactions and disulphide bridges; and,
  • Quaternary structure: the agglomeration of two or more polypeptide groups to form a single functional unit.

This framework describes journey from the basic understanding of the nature of a protein achieved by Frederick Sanger in the early 1950s, to the modern, ore sophisticated awareness of how the structure determines their mode of action. I remember being particularly taken by a description of how sickle cell anaemia was caused by a change of a single amino acid in the haemoglobin molecule, altering the structure of the protein and, in the process, reducing its capacity to carry oxygen.

There is a metaphor for those of us who study biofilms here. To borrow the analogy of protein structure, the basic list of taxa and their relative abundance is the “primary structure” of a biofilm. Within this basic “name-and-count” we have various “flavours”, from diehard diatomists who ignore all other types of organisms through to those who go beyond counting to consider absolute abundance and cell size in their analyses. Whatever their predilection, however, they share a belief that raw taxonomic information, weighted in some way by quantity, yields enough information to make valid ecological inferences. And, indeed, there are strong precedents for this, especially when the primary goal is to understand broad-scale interactions between biofilms and their chemical environment.

But does this good understanding of the relationship between biofilm “primary structure” and chemistry comes at the expense of a better understanding of the inter-relationships within the biofilm. And, turning that around, might these inter-relationships, in turn, inform a more nuanced interpretation of the relationship between the biofilm and its environment? So let’s push the metaphor with protein structure a little further and see where that leads us.

The “tertiary structure” of a submerged biofilm: this one shows the inter-relationships of diatoms within a Didymosphenia geminata colony.  Note how the long stalks of Didymosphenia provide substrates for Achnanthidium cells (on shorter stalks) and needle-like cells of Fragilaria and Ulnaria.   You can read more about this here.  The image at the top of the post shows a biofilm from the River Wyle, described in more detail here.

We could think of the “secondary structure” of a biofilm as the organisation of cellular units into functional groups. This would differentiate, for example, filaments from single cells, flagellates from non-flagellates and diatoms that live on long stalks from those that live adpressed to surfaces. It could also differentiate cells on the basis of physiology, distinguishing nitrogen-fixers from non-nitrogen fixers, for example. We might see some broad phylogenetic groupings emerging here (motility of diatoms, for example, being quite different from that of flagellated green algae) but also some examples of convergence, where functional groups span more than one algal division.

Quite a few people have explored this, particularly for diatoms, though results are not particularly conclusive. That might be because we cannot really understand the subtleties of biofilm functioning when information on every group except diatoms has been discarded, and it might be because people have largely been searching for broad-scale patterns when the forces that shape these properties work at a finer scale. General trends that have been observed include an increase in the proportion of motile diatoms to increase along enrichment gradients. However, this has never really been converted into a “take-home message” that might inform the decisions that a catchment manager might take, and so rarely form part of routine assessment methods.

Next, there is a “tertiary structure”, the outcome of direct relationships between organisms and environment, interdependencies amongst those organisms to form a three-dimensional matrix, and time. This is the most elusive aspect of biofilm structure, largely because it is invariably destroyed or, at best, greatly distorted during the sample collection and analysis phases. This has been little exploited in ecological studies, perhaps because it is less amenable to the reductive approach that characterises most studies of biofilms. But I think that there is potential here, at the very least, to place the outcomes of quantitative analyses into context.  We could, in particular, start to think about the “foundation species” – i.e. those that define the structure of the community by creating locally stable conditions (see the paper by Paul Dayton below).   This, in turn, gives us a link to a rich vein of ecological thinking, and helps us to understand not just how communities have changed but also why.

The tertiary structure of a Cladophora-dominated biofilm from the River Team, Co. Durham.  Cladophora, in this case, functions as a “foundation species”, creating a habitat within which other algae and microorganisms exist.   You can read more about this in “A return to the River Team”.

Finally, if we were looking for a biofilm “quaternary structure” we could, perhaps, think about how the composition at any single point in space and time grades and changes to mould the community to favour fine-scale “patchiness” in the habitat and also to reflect seasonal trends in factors that shape the community (such as grazing).   Biofilms, in reality, represent a constantly shifting set of “metacommunities” whose true complexity is almost impossible to capture with current sampling techniques.

Some of this thinking ties in with posts from earlier in the year (see, for example, “Certainly uncertain”, which draws on an understanding of tertiary structure to explain variability in assessments based on phytobenthos communities).  But there is more that could be done and I hope to use some of my posts in 2018 to unpick this story in a little more detail.

That’s enough from me for now.  Enjoy the rest of the festive season.

Selected references

Foundation species:

Dayton, P. K. (1972). Toward an understanding of community resilience and the potential effects of enrichments to the benthos at McMurdo Sound, Antarctica. pp. 81–96 in Proceedings of the Colloquium on Conservation Problems Allen Press, Lawrence, Kansas.

“secondary structure” of biofilms

Gottschalk, S. & Kahlert, M. (2012). Shifts in taxonomical and guild composition of littoral diatom assemblages along environmental gradients.  Hydrobiologia 694: 41-56.

Law, R., Elliott, J.A., & Thackeray, S.J. (2014).  Do functional or morphological classifications explain stream phytobenthic community assemblages?  Diatom Research 29: 309-324.

Molloy, J.M. (1992).  Diatom communities along stream longitudinal gradients.  Freshwater Biology, 28: 56-69.

Steinman, A.D., Mulholland, P.J. & Hill, W.R. (1992).  Functional responses associated with growth form in stream algae.  Journal of the North American Benthological Society 11: 229-243.

Tapolczai, K., Bouchez, A., Stenger-Kovács, C., Padisák, J. & Rimet, F. (2016).  Trait-based ecological classifications for benthic algae: review and perspectives.  Hydrobiologia 776: 1-17.

“tertiary structure” of biofilms

Bergey, E.A., Boettiger, C.A. & Resh, V.H. (1995).  Effects of water velocity on the architecture and epiphytes of Cladophora glomerata (Chlorophyta).  Journal of Phycology 31: 264-271.

Blenkinsopp, S.A. & Lock, M.A. (1994).  The impact of storm-flow on river biofilm architecture.   Journal of Phycology 30: 807-818.

Kelly, M.G. (2012).   The semiotics of slime: visual representation of phytobenthos as an aid to understanding ecological status.   Freshwater Reviews 5: 105-119.

Winning hearts and minds …

I write several of my posts whilst travelling, though am always conscious of the hypocrisy of writing an environmentally-themed blog whilst, at the same time, chalking up an embarrassing carbon footprint.  Last month, however, I participated in my first “eConference”, in which the participants were linked by the internet.  With over 200 people from all over Europe, and beyond, attending for all or part of the three days, there was a substantial environmental benefit and whilst there was little potential for the often-useful “off-piste” conversations that are often as useful as the formal programme of a conference, there were some unexpected benefits.  I, for example, managed to get the ironing done whilst listening to Daniel Hering and Annette Battrup-Pedersen’s talks.

You can find the presentations by following this link: https://www.ceh.ac.uk/get-involved/events/future-water-management-europe-econference.   My talk is the first and, in it, I tried to lay out some of the strengths and weaknesses of the ways that we collect and use ecological data for managing lakes and rivers.  I was aiming to give a high level overview of the situation and, as I prepared, I found myself drawing, as I often seem to do, on medical and health-related metaphors.

At its simplest, ecological assessment involves looking at a habitat, collecting information about the types of communities that are present and match the information we collect to knowledge that we have obtained from outside sources (such as books and teachers) and from prior experience in order to guide decisions about future management of that habitat. At its simplest, this may involve categoric distinctions (“this section of a river is okay, but that one is not”) but we often find that finer distinctions are necessary, much as when a doctor asks a patient to articulate pain on a scale of one to ten.  The doctor-patient analogy is important, because the outcomes from ecological assessment almost always need to be communicated to people with far less technical understanding than the person who collected the information in the first place.

I’ve had more opportunity than I would have liked to ruminate on these analogies in recent years as my youngest son was diagnosed with Type I diabetes in 2014 (see “Why are ecologists so obsessed with monitoring?”).   One of the most impressive lessons for me was how the medical team at our local hospital managed to both stabilise his condition and teach him the rudiments of managing his blood sugar levels in less than a week.   He was a teenager with limited interest in science so the complexities of measuring and interpreting blood sugar levels had to be communicated in a very practical manner.  That he now lives a pretty normal life stands testament to their communication, as much to their medical, skills.

The situation with diabetes offers a useful parallel to environmental assessment: blood sugar concentrations are monitored and evaluated against thresholds.  If the concentration crosses these thresholds (too high or too low), then action is taken to either reduce or increase blood sugar (inject insulin or eat some sugar or carbohydrates, respectively).   Blood sugar concentrations change gradually over time and are measured on a continuous scale.  However, for practical purposes they can be reduced to a simple “Goldilocks” formula (“too much”, “just right”, “not enough”).  Behind each category lie, for a diabetic, powerful associations that reinforce the consequences of not taking action (if you have even seen a diabetic suffering a “hypo”, you’ll know what I mean).

Categorical distinctions versus continuous scales embody the tensions at the heart of contemporary ecological assessment: a decision to act or not act is categorical yet change in nature tends to be more gradual.   The science behind ecological assessment tends to favour continuous scales, whilst regulation needs thresholds.  This is, indeed, captured in the Water Framework Directive (WFD): there are 38 references to “ecological status”, eight in the main text and the remainder in the annexes.  By contrast, there are just two references to “ecological quality ratios” – the continuous scale on which ecological assessment is based – both of which are in an annex.   Yet, somehow, these EQRs dominate conversation at most scientific meetings where the WFD is on the agenda.

You might think that this is an issue of semantics.  For both diabetes and ecological assessment, we can simply divide a continuous measurement scale into categories so what is the problem?   For diabetes, I think that the associations between low blood sugar and unpleasant, even dangerous consequences are such that it is not a problem.  For ecological assessment, I’m not so sure.  Like diabetes, our methods are able to convey the message that changes are taking place.  Unlike diabetes, they are often failing to finish the sentence with “… and bad things will happen unless you do something”.   EQRs can facilitate geek-to-geek interactions but often fail to transmit the associations to non-technical audiences – managers and stakeholders – that make them sit up and take notice.

I’d like to think that we can build categorical “triggers” into methods that make more direct links with these “bad things”.  In part, this would address the intrinsic uncertainty in our continuous scales (see “Certainly uncertain …”) but it would also greatly increase the ability of these methods to communicate risks and consequences to non-technical audiences (“look – this river is full of sewage fungus / filamentous algae – we must do something!”).   That’s important because, whilst I think that the WFD is successful at setting out principles for sustainable management of water, it fails if considered only as a means for top-down regulation.   In fact, I suspect that Article 14, which deals with public participation, is partly responsible for regulators not taking action (because “costs” are perceived as disproportionate to “benefits”) than for driving through improvements.   We need to start thinking more about ensuring that ecologists are given the tools to communicate their concerns beyond a narrow circle of fellow specialists (see also “The democratisation of stream ecology?”).   Despite all the research that the WFD has spawned, there has been a conspicuous failure to change “hearts and minds”.  In the final analysis, that is going to trump ecological nuance in determining the scale of environmental improvement we should expect.

Certainly uncertain …

Back in May I set out some thoughts on what the diatom-based metrics that we use for ecological assessment are actually telling us (see “What does it all mean?”).  I suggested that diatoms (and, for that matter, other freshwater benthic algae) showed four basic responses to nutrients and that the apparent continua of optima obtained from statistical models was the result of interactions with other variables such as alkalinity.   However, this is still only a partial explanation for what we see in samples, which often contain species with a range of different responses to the nutrient gradient.  At a purely computational level, this is not a major problem, as assessments are based on the average response of the assemblage. This assumes that the variation is stochastic, with no biological significance.  In practice, standard methods for sampling phytobenthos destroy the structure and patchiness of the community at the location, and our understanding is further confounded by the microscopic scale of the habitats we are trying to interpret (see “Baffled by the benthos (1)”).  But what if the variability that we observe in our samples is actually telling us something about the structure and function of the ecosystem?

One limitation of the transfer functions that I talked about in that earlier post is that they amalgamate information about individual species but do not use any higher level information about community structure.  Understanding more about community structure may help us to understand some of the variation that we see.   In the graph below I have tried to visualise the response of the four categories of response along the nutrient/organic gradient in a way that tries to explain the overlap in occurrence of different types of response.   I have put a vertical line on this graph in order that we can focus on the community at one point along the pollution gradient, noting, in particular, that three different strategies can co-exist at the same level of pollution.  Received wisdom amongst the diatom faithful is that the apparent variation we see in ecological preferences amongst the species in a single sample reflects inadequacies in our taxonomic understanding.  My suggestion is that this is partly because we have not appreciated how species are arranged within a biofilm.  I’ve tried to illustrate this with a diagram of a biofilm that might lead to this type of assemblage.

Schematic diagram showing the response of benthic algae along a nutrient/organic gradient.  a.: taxa thriving in low nutrient / high oxygen habitats; b.: taxa thriving in high nutrient / high oxygen habitats; c.: taxa thriving in high nutrient / low oxygen habitats; d.: taxa thriving in high nutrients / very low oxygen habitats.   H, G., M, P and B refer to high, good, moderate, poor and bad ecological status.

The dominant alga in many of the enriched rivers in my part of the world is the tough, branched filamentous green alga Cladophora glomerata.   This, in turn, creates micro-habitats for a range of algae.  Some algae, such as Rhoicosphenia abbreviata, Cocconeis pediculus and Chamaesiphon incrustans, thrive as epiphytes on Cladophora whilst others, such as C. euglypta are often, but not exclusively, found in this microhabitat.  Living on Cladophora filaments gives them better access to light but also means that their supply of oxygen is constantly replenished by the water (few rivers in the UK are, these days, so bereft of oxygen to make this an issue).   All of these species fit neatly into category b. in my earlier post.

Underneath the Cladophora filaments, however, there is a very different environment.  The filaments trap organic and inorganic particulate matter which are energy sources for a variety of protozoans, bacteria and fungi.   These use up the limited oxygen in the water, possibly faster than it can be replenished, so any algae that live in this part of the biofilm need to be able to cope with the shading from the Cladophora plus the low levels of oxygen.   Many of the species that we find in highly polluted conditions are motile (e.g. Nitzschia palea), and so are able to constantly adjust their positions, in order to access more light and other resources.   They will also need to be able to cope with lower oxygen concentrations and, possibly, with consequences such as highly reducing conditions.  These species will fit into categories c. and d. in the first diagram.

A stylised (and simplified) cross-section through a biofilm in a polluted river, showing how different algae may co-exist.   The biofilm is dominated by Cladophora glomerata (i.) with epiphytic Rhoicosphenia abbreviata (ii.), Cocconeis euglypta (iii.) and Chamaesiphon incrustans (iv.) whilst, lower down in the biofilm, we see motile Nitzschia palea (v.) and Fistulifera and Mayamaea species (vi.) growing in mucilaginous masses.

However, as the cross-section above represents substantially less than a millimetre of a real biofilm, it is almost impossible to keep apart when sampling, and we end up trying to make sense of a mess of different species.   The ecologists default position is, inevitably, name and count, then feed the outputs into a statistical program and hope for the best.

A final complication is that river beds are rarely uniform.  The stones that make up the substrate vary in size and stability, so some are rolled by the current more frequently than others.  There may be patches of faster and slower flow associated with the inside and outsides of meanders, plus areas with more or less shade.   As a result, the patches of Cladophora will vary in thickness (some less stable stones will lack them altogether) and, along with this, the proportions of species exhibiting each of the strategies.  The final twist, therefore, is that the vertical line that I drew on the first illustration to illustrate a point on a gradient is, itself, simplistic.  As the proportions vary, so the position of that line will also shift.  Any one sample (itself the amalgamation of at least five microhabitats) could appear at a number of different points on the gradient.  Broadly speaking, uncertainty is embedded into the assessment of ecological status using phytobenthos as deeply as it is in quantum mechanics.  We can manage uncertainty to some extent by taking care with those aspects that are within our control.   However, in the final analysis, a sampling procedure that involves an organism 25,000 times larger than most diatoms blundering around a stream wielding a toothbrush is invariably going to have limitations.

The same schematic diagram as that at the start of this article, but with the vertical line indicating the position of a hypothetical sample replaced by a rectangle representing the range of possibilities for samples at any one site. 

Primed for the unexpected?

I was in Nottingham last week for a CIEEM conference entitled “Skills for the future” where I led a discussion on the potential and pitfalls of DNA barcoding for the applied ecologist.  It is a topic that I have visited in this blog several times (see, for example, “Glass half full or glass half empty?”).  My original title was to have been “Integrating metabarcoding and “streamcraft” for improved ecological assessment in freshwaters”; however, this was deemed by the CIEEM’s marketing staff to be insufficiently exciting so I was asked to come up with a better one.  I was mildly piqued by the implication that my intended analysis of how to blend the old with the new was not regarded as sufficiently interesting so sent back “Metabarcoding: will it cost me my job?” as a facetious alternative.  They loved it.

So all I had to do was find something to say that would justify the title.   Driving towards Nottingham it occurred to me that the last time I should have made this trip was to Phil Harding’s retirement party.  I was invited, but had a prior engagement.  I would have loved to have been there as I have known Phil for a long time.  And, as I drew close to my destination, it occurred to me that Phil’s career neatly encapsulated the development of freshwater ecological assessment in the UK over the past 40 years.  He finished his PhD with Brian Whitton (who was also my supervisor) in the late 1970s and went off to work for first North West Water Authority and then Severn Trent Water Authority.   When the water industry was privatised in 1989, he moved to the National Rivers Authority until that was absorbed into the Environment Agency in 1995.   Were he more ambitious he could have moved further into management, I am sure, but Phil was able to keep himself in a jobs that got him out into the field at least occasionally throughout his career.   That means he has experienced the many changes that have occurred the past few decades first hand.

jpch_then

Phil Harding: early days as a biologist with North West Water Authority in the late 70s

Phil had a fund of anecdotes about life as a freshwater biologist.  I remember one, in particular, about sampling invertebrates in a small stream in the Midlands as part of the regular surveys that biologists performed around their areas.   On this particular occasion he noticed that some of the invertebrate nymphs and larvae that he usually saw at this site were absent when he emptied out his pond net into a tray.   Curious to find out why, he waded upstream, kicking up samples periodically to locate the point at which these bugs reappeared in his net.   Once this had happened, he knew that he was upstream of the source of the problem and could focus on searching the surrounding land to find the cause.   On this occasion, he found a farmyard beside a tributary where there was a container full of pesticides that had leaked, poisoning the river downstream.

I recount this anecdote at intervals because it sums up the benefits of including biology within environmental monitoring programmes.   Chemistry is very useful, but samples are collected, typically, no more than once a month and, once in the laboratory, you find a chemical only if you set out to look for it and only if it was present in the river at the time that the sample was collected.  Chemical analysis of pesticides is expensive and the concentrations in rivers are notoriously variable, so the absence of a pesticide in a monthly water sample is no guarantee that it was never there.  The invertebrates live in the river all the time, and the aftershocks of an unexpected dose of pesticide are still reverberating a few weeks later when Phil rolls up with his pond net.   But the success of this particular incident depends on a) Phil being alert enough to notice the change and b) having time for some ad hoc detective work.

This encapsulates the “streamcraft” which formed part of my original title.   This is the ability to “read” the messages in the stream that enable us to understand the processes that are taking place and, in turn, the extent to which man’s activities have altered these (see “Slow science and streamcraft”).  It is something you cannot be taught; you have to learn it out in the field, and the Environment Agency and predecessors was, for a long while, well set up to allow this process of personal development.    Changes over the past few years, in the name of greater efficiency (and, to be fair, in the face of enormous budget cuts) have, I fear, seriously eroded this capability, not least because biologists spend far less time in the field, and are no longer responsible for collecting their own invertebrate or diatom samples.

jpch_now

Phil Harding: forty years on, sampling algae in the River Ashop in Derbyshire.

In my talk, I was thinking aloud about the interactions between metabarcoding and the higher level cognitive skills that a good biologist needs.   I feared that, in the wrong hands, it could be yet another means by which the role of the biologist was eroded to that of a technician feeding samples into one end of a series of swish machines, before staring at spreadsheets of data that emerged from the other end.   All the stages where the old school biologist might parse the habitat or sample s/he was investigating and collect signs and indications of its condition over and above the bare minimum set in the protocol were stripped away.

A further reason why this might be a problem is that molecular ecology takes a step backwards from the ideal of biological assessment.  Much as the chemist only sees what his chosen analyses allow him to see, so the molecular biologist will only “see” what his particular set of primers reveal.   Moreover, their interpretation of the spreadsheets of data that emerge is less likely to be qualified by their direct experience of the site because their time is now too precious, apparently, to allow them to collect samples for routine assessments.

A few points emerged out of the discussion that followed (the audience included representatives of both Environment Agency and Natural England).    First, we agreed that metabarcoding is not, itself, the problem; however, applying metabarcoding within an already-dysfunctional organisation might accentuate existing problems.  Second, budgets are under attack anyway and metabarcoding may well allow monitoring networks to be maintained at something approaching their present scale.  Third, the issue of “primers” was real but, as we move forward, it is likely that the primer sets will be expanded and a single analysis might pick up a huge range of information.  And, finally, the advent of new technologies such as the MinION might put the power of molecular biology directly into the hands of field biologists (rather than needing high throughput laboratories to harness economies of scale).

That last point is an important one: molecular ecology is a fast moving field with huge potential for better understanding of the environment.    However, we need to be absolutely clear that an ability to generate huge amounts of data does will not translate automatically into that better understanding.   We will still need biologists with an ability to exercise higher cognitive skills and, therefore, organisations will need to provide biologists with opportunities to develop those skills. Metabarcoding, in other words, could be a good friend to the ecologist but will make a poor master.  In the short term , the rush to embrace metabarcoding because it is a) fashionable and b) cheap may erode capabilities that have taken years to develop  and which will be needed it we are to get the full potential out of these methods.   What could possibly go wrong?

Identification by association?

A few months ago, I wrote briefly about the problems of naming and identifying very small diatoms (see “Picture this?”).   It is a problem that has stayed with me over the last few months, particularly as I oversee a regular calibration test for UK diatom analysts.   The most recent sample that we used for this exercise contained a population of the diatom formerly known as “Eolimna minima”, the subject of that post.   Using the paper by Carlos Wetzel and colleagues, we provisionally re-named this “Sellaphora atomoides”.   Looking back into my records, I noticed that we had also recorded “Eolimna minima” from an earlier slide used in the ring test.   These had a slightly less elliptical outline, and might well be “Sellaphora nigri” using the criteria that Wetzel and colleagues set out.   There are slight but significant differences in valve width, and S. nigri also has denser striation (though this is hard to determine with the light microscope).   These populations came from two streams with very different characteristics, so there is perhaps no surprise that there are two different species?

Eolimna_minima_GMEP37111

A population of “Eolimna minma” / Sellaphora cf. atomoides from unnamed Welsh stream used in UK/Ireland ring test (slide #39)  (photographs: Lydia King).

The differences in ecology are what concern me here.   Wetzel and colleagues focus on taxonomy in their paper but make a few comments on ecology too.  They write: “The general acceptance is that S. atomoides … is usually found in aerial habitats (or more “pristine” conditions) while the presence of Sellaphora nigri … is more related to human-impacted conditions of eutrophication, pesticides, heavy metal pollution and organically polluted environments”.  This statement is worrying because it suggests that the ecological divide between these two species is clear-cut.   Having spent 30 pages carefully dissecting a confusing muddle of species, it strikes me as counterproductive to repeat categorical statements made by earlier scientists who they had just demonstrated to have a limited grasp of the situation.

The risk is that a combination of slight differences in morphology coupled with (apparently) clear differences in ecology leads to the correct name being assigned based on the analyst’s interpretation of the habitat, rather than the characteristics of the organism.   This is not speculation on my part, as I have seen it happen during workshops.   On two occasions, the analysts involved were highly experienced.  Nonetheless, the justification for using a particular name, in each case, was that the other diatoms present suggested a certain set of conditions, which coincided with the stated preferences for one species, rather than with those for a morphologically-similar species.

I have no problem with environmental preferences being supporting information in the designation of a species – these can suggest physiological and other properties with a genetic basis that separate a species from closely-related forms.  However, I have great concerns about these preferences being part of the identification process for an analysis that is concerned, ultimately, with determining the condition of the environment.  It is circular reasoning but, nonetheless, I fear, widespread, especially for small taxa where we may need to discern characteristics that are close to limits of the resolution of the light microscope.

Gomphonema exilissimum is a case in point.  It is widely-regarded as a good indicator of low nutrients (implying good conditions) yet there have been papers recently that have pointed out that our traditional understanding based on the morphology of this this species and close relatives is not as straightforward as we once thought.   Yet, the key in a widely-used guide to freshwater diatoms (written with ecological assessment in mind) contains the phrase “In oligotrophen, elektrolytarmen, meist schwach sauren Habitaten” (“in oligotrophic, electrolyte-poor, mostly weakly-acid habitats”) amongst the characters that distinguish it from close relatives.  The temptation to base an identification wholly or partly on an inference from the other diatoms present is great.

Including an important environmental preference in a key designed for use by people concerned with ecological assessment brings the credibility of the discipline into question.   Either a species can be clearly differentiated on the basis of morphology alone, or it has no place in evaluations that underpin enforcement of legislation.   That, however, takes us into dangerous territory: there is evidence that the limits of species determined by traditional microscopy do not always accord with other sources of evidence, in particular DNA sequence data.   These uncertainties, in turn, contribute to the vague descriptions and poor illustrations which litter identification guides, leaving the analyst (working under time pressure) to look for alternative sources of corroboration.  I suspect that many of us are guilty of “identification by association” at times.   We just don’t like to admit it.

References

Hofmann, G., Werum, M. & Lange-Bertalot, H. (2011).  Diatomeen im Süßwasser-Benthos von Mitteleuropa.  A.R.G. Gantner Verlag K.G., Rugell.  [the source of the key mentioned above]

Wetzel, C., Ector, L., Van de Vijver, B., Compère, P. & Mann, D.G. (2015). Morphology, typification and critical analysis of some ecologically important small naviculoid species (Bacillariophyta).  Fottea, Olomouc 15: 203-234.

Two papers that highlight challenges facing the identification of the Gomphonema parvulum complex (to which G. exilissimum belongs) are:

Kermarrec, L., Bouchez, A., Rimet, F. & Humbert, J.-F. (2013).  First evidence of the existence of semi-cryptic species and of a phylogeographic structure in the Gomphonema parvulum (Kützing) Kützing complex (Bacillariophyta).   Protist 164: 686-705.

Rose, D.T. & Cox, E.J. (2014).  What constitutes Gomphonema parvulum? Long-term culture studies show that some varieties of G. parvulum belong with other Gomphonema species.  Plant Ecology and Evolution 147: 366-373.

It’s just a box …

illumina_MiSeq_linocut

Today’s post starts with a linocut of an Illumina MiSeq Next Generation Sequencer (NGS), as part of an ongoing campaign to demystify these state-of-the-art £80,000 pound instruments. It’s just a box stuffed with clever electronics.   The problem is that tech-leaning biologists go misty-eyed at the very mention of NGS, and start to make outrageous claims for what it can do.   But how much are they actually going to change the way that we assess the state of the environment?   I approach this topic as an open-minded sceptic (see “Replaced by a robot?” and “Glass half full or glass half empty?” and other posts) but I have friends who know what buttons to press, and in what order. Thanks to them, enough of my samples have been converted into reams of NGS data for me now to be in a position to offer an opinion on their usefulness.

So here are three situations where I think that that NGS may offer advantages over “traditional” biology:

  1. reducing error / uncertainty when assessing variables with highly-contagious distributions.
    Many of the techniques under consideration measure “environmental DNA” (“eDNA”) in water samples. eDNA is DNA released into water from skin, faeces, mucus, urine and a host of other ways.   In theory, we no longer need to hunt for Great Crested Newts in ponds (a process with a high risk of “type 2 errors” – “false negatives”) but can take water samples and detect the presence of newts in the pond directly from these.  The same logic applies to lake fish, many of which move around the lake in shoals, which may be missed by sampler’s nets altogether or give false estimates of true abundance.   In both of these cases, the uncertainties in traditional methods can be reduced by increasing effort, but this comes at a cost, so methods based on eDNA show real potential (the Great Crested Newt method is already in use).
  2. Ensuring consistency when dealing with cryptic / semi-cryptic species
    I’ve written many posts about the problems associated with identifying diatoms.   We have ample evidence, now, that there are far more species than we thought 30 years ago. This, in turn, is challenging the ability to create consistent datasets when analysts spread around several different laboratories are trying to make fine distinctions between species based on a very diffuse literature.   Those of us who study diatoms now work at the very edge of what can be discriminated with the light microscope and the limited data we do now have from molecular studies suggests that there are sometimes genetic differences even when it is almost impossible to detect variation in morphology.   NGS has the potential for reducing the analytical error that results from these difficulties although, it is important to point out, many other factors (spatial and temporal) contribute to the overall variation between sites and, therefore, to our understanding of the effect of human pressures on diatom assemblages.
  3. Reducing costs
    This is one of the big benefits of NGS in the short term.   The reduction in cost is partly a result of the expenses associated with tackling the first two points by conventional means.   You can usually reduce uncertainty by increasing effort but, as resources are usually limited, this increase in effort means channelling funds that could be used more profitably elsewhere.   However, there will also be a straightforward time saving, because of the economies of scale that accompanies high-throughput NGS.   A single run of an Illumina MiSeq can process 96 samples in a few hours, whereas each would have required one to two hours for analysis by light microscope. Even when the costs of buying and maintaining the NGS machines are factored in, NGS still offers a potential cost saving over conventional methods.

It is worth asking whether these three scenarios – statistical, taxonomic and financial – really amount to better science, or whether NGS is just a more efficient means of applying the same principles (“name and count”) that underpins most ecological assessment at present.   From a manager’s perspective, less uncertainty and lower cost is a beguiling prospect.   NGS may, as a result, give greater confidence in decision making, according to the current rules. That may make for better regulation, but it does not really represent a paradigm shift in the underlying science.

The potential, nonetheless, is there. A better understanding of genetic diversity, for example, may make it easier to build emerging concepts such as ecological resilience into ecological assessment (see “Baffled by the benthos (2)” and “Making what is important measurable”). Once we have established NGS as a working method, maybe we can assess functional genes as well as just taxonomic composition?   The possibilities are endless.  The Biomonitoring 2.0 group is quick to make these claims.   But it is important to remember that, at this stage, they are no more than possibilities   So far, we are still learning to walk …

Reference

Baird, D.J. & Hajibabaei, M. (2012). Biomonitoring 2.0: a new paradigm in ecosystem assessment made possible by next-generation DNA sequencing. Molecular Ecology 21: 2039-2044.