About a year ago, I made a dire prediction about the future of diatom taxonomy in the new molecular age (see “Murder on the barcode express …“). A year on, I thought I would return to this topic from a different angle, using the “Turing Test” in Artificial Intelligence as a metaphor. The Turing Test (or “Imitation Game”) was derived by Alan Turing in 1950 as a test of a machine’s ability to exhibit intelligent behaviour, indistinguishable from that of a human (encapsulated as “can machines do what we [as thinking entities] can do?”).
My primary focus over the past few years has not been the role of molecular biology in taxonomy, but rather the application of taxonomic information to decision-making by catchment managers. So my own Imitation Game is not going to ask whether computers will ever identify microscopic algae as well as humans, but rather can they give the catchment manager the information they need to make a rational judgement about the condition of a river and the steps needed to improve or maintain that condition as well as a human biologist?
One of the points that I made in the earlier post is that current approaches based on light microscopy are already highly reductionist: a human analyst makes a list of species and their relative abundances which are processed using standardised metrics to assign a site to a status class. In theory, there is the potential for the human analysts to then add value to that assignment through their interpretations. The extent to which that happens will vary from country to country but there two big limitations: first, our knowledge of the ecology of diatoms is meagre (see earlier post) and, in any case, diatoms represent only a small part of the total diversity of microscopic algae and protists present in any river. That latter point, in particular, is spurring some of us to start exploring the potential of molecular methods to capture this lost information but, at the same time, we expect to encounter even larger gaps in existing taxonomic knowledge than is the case for diatoms.
One very relevant question is whether this will even be perceived as a problem by the high-ups. There is a very steep fall-off in technical understanding as one moves up through the management tiers of environmental regulators. That’s inevitable (see “The human ecosystem of environmental management…“) but a consequence is that their version of the Imitation Game will be played to different rules to that of the Environment Agency’s Poor Bloody Infantry whose game, in turn, will not be the same as that of academic taxonomists and ecologists. So we’ll have to consider each of these versions separately.
Let’s start with the two extreme positions: the traditional biologist’s desire to retain a firm grip on Linnaean taxonomy versus the regulator’s desire for molecular methods to imitate (if not better) the condensed nuggets of information that are the stock-in-trade of ecological assessment. If the former’s Imitation Game consists of using molecular methods to capture the diversity of microalgae at least as well as human specialists, then we run immediately into a new conundrum: humans are, actually, not very good at doing this, and molecular taxonomy is one of the reasons we know this to be true. Paper after paper has shown us the limitations of taxonomic concepts developed during the era of morphology-based taxonomy. In the case of diatoms we are now in the relatively healthy position of a synergy between molecular and morphological taxonomy but the outcomes usually indicate far more diversity than we are likely to be able to catalogue using formal Linnaean taxonomy to make this a plausible option in the short to medium-term.
If we play to a set of views that is interested primarily in the end-product, and is less interested in how this is achieved, then it is possible that taxonomy-free approaches such as those advocated by Jan Pawlowski and colleagues, would be as effective as methods that use traditional taxonomy. As no particular expertise is required to collect a phytobenthos sample, and the molecular and computing skills required are generic rather than specific to microalgae, the entire process could by-pass anyone with specialist understanding altogether. The big advantages are that it overcomes the limitations of a dependence on libraries of barcodes of known species and, as a result, that it does not need to be limited to particular algal groups. It also has the greatest potential to be streamlined and, so, is likely to be the cheapest way to generate usable information. However, two big assumptions are built into this version of the Imitation Game: first, there is absolutely no added value from knowing what species are present in a sample and, second, that it is, actually, legal. The second point relates to the requirement in the Water Framework Directive to assess “taxonomic composition” so we also need to ask whether a list of “operational taxonomic units” (OTUs) meets this requirement.
In between these two extremes, we have a range of options whereby there is some attempt to align molecular barcode data with taxonomy, but stopping short of trying to catalogue every species present. Maybe the OTUs are aggregated to division, class, order or family rather than to genus or species? That should be enough to give some insights into the structure of the microbial world (and be enough to stay legal!) and would also bring some advantages. Several of my posts from this summer have been about the strange behavior of rivers during a heatwave and, having commented on the prominence and diversity of green algae during this period, it would be foolish to ignore a method that would pick up fluctuations between algal groups better than our present methods. On the other hand, I’m concerned that an approach that only requires a match to a high-level taxonomic group will enable bioinformaticians and statisticians to go fishing for correlations with environmental variables without needing a strong conceptual behind their explorations.
My final version of the Imitation Game is the one played by the biologists in the laboratories around the country who are simultaneously generating the data used for national assessments and providing guidance on specific problems in their own local areas. Molecular techniques may be able to generate the data but can it explain the consequences? Let’s assume that method in the near future aggregates algal barcodes into broad groups – greens, blue-greens, diatoms and so on, and that some metrics derived from these offer correlations with environmental pressures as strong or stronger than those that are currently obtained. The green algae are instructive in this regard: they encompass an enormous range of diversity from microscopic single cells such as Chlamydomonas and Ankistrodesmus through colonial forms (Pediastrum) and filaments, up to large thalli such as Ulva. Even amongst the filamentous forms, some are signs of a healthy river whilst others can be a nuisance, smothering the stream bed with knock-on consequences for other organisms. A biologist, surely, wants to know whether the OTUs represent single cells or filaments, and that will require discrimination of orders at least but in some cases this level of taxonomic detail will not be enough. The net alga, Hydrodictyon(discussed in my previous post) is in the same family as Pediastrumso we will need to be able to discriminate separate genera in this case to offer the same level of insight as a traditional biologist can provide. We’ll also need to discriminate blue-green algae (Cyanobacteria) at least to order if we want to know whether we are dealing with forms that are capable of nitrogen fixation – a key attribute for anyone offering guidance on their management.
The primary practical role of Linnaean taxonomy, for an ecologist, is to organize data about the organisms present at a site and to create links to accumulated knowledge about the taxa present. For many species of microscopic algae, as I stressed in “Murder on the barcode express …”, that accumulated knowledge does not amount to very much; but there are exceptions. There are 8790 records on Google Scholar for Cladophora glomerata, for example, and 2160 for Hydrodictyon reticulatum. That’s a lot of wisdom to ignore, especially for someone who has to answer the “so what” questions that follow any preliminary assessment of the taxa present at a site. But, equally, there is a lot that we don’t know and molecular methods might well help us to understand this. There will be both gains and losses as we move into this new era but, somehow, blithely casting aside hard-won knowledge seems to be a retrograde step.
Let’s end on a subversive note: I started out by asking whether “machines” (as a shorthand for molecular technology) can do the same as humans but the drive for efficiency over the last decade has seen a “production line” ethos creeping into ecological assessment. In the UK this has been particularly noticeable since about 2010, when public sector finances were squeezed. From that point on, the “value added” elements of informed biologists interpreting data from catchments they knew intimately started to be eroded away. I’ve described three versions of the Imitation Game and suggested three different outcomes. The reality is that the winners and losers will depend upon who makes the rules. It brings me back to another point that I have made before (see “Ecology’s Brave New World …”): that problems will arise not because molecular technologies are being used in ecology, but due to how they are used. It is, in the final analysis, a question about the structure and values of the organisations involved.
Apothéloz-Perret-Gentil, L., Cordonier, A., Straub, F., Iseili, J., Esling, P. & Pawlowksi, J. (2017). Taxonomy-free molecular diatom index for high-throughput eDNA monitoring. Molecular Ecology Resources17: 1231-1242.
Turing, A. (1950). Computing machinery and intelligence. Mind59: 433-460.