When you look at an organism, how do you know what it is? That’s a big question that hovers over many of the posts that I write. I tell you the names of organisms and you believe me. Sometimes I do too. The truth is that we take the way that our brains process the constant stream of signals that our eyes send us as we observe the natural world without a second thought. The subject intrigues me, but I only manage to scratch the surface in the posts that I write (see “Abstracting from reality …” and “Do we see through a microscope?” for some of these speculations).
The plate below offers a case study in this process. It shows a diatom we encountered in a recent ring test, and which most us agreed was either Fragilaria austriaca or something quite similar. In binary terms, though, we have to be blunt: either it is Fragilaria austriaca or it is not which may have implications for subsequent recording and interpretation (see “All exact science is dominated by the idea of approximation”). How come a group of experienced analysts can look at the same population of diatoms and reach different conclusions? I’ve got two suggestions: the first is that we differ in how we process the images, and the second is that there are sources of systematic error which confound our attempts to seek the right answer.
“Fragilaria austriaca” from Foreshield Burn, Cumbria, May 2019.
There are three basic strategies that we use to name an organism:
- Probabilistic reasoning, through the use of keys which, in theory at least, have a logical structure that guides a user to the correct identity of an unknown specimen. In practice, this is not quite as straightforward as it sounds (see “Empathy with the ignorant …”) and, at some point, many of us will abandon the formal structure of a key and switch to …
- Pattern recognition, which amounts to flicking through images until we find one that matches our specimen. We can then corroborate this preliminary match by checking the written description. In practice, we will probably switch from probabilistic reasoning to pattern recognition and back again as we home in on the identity of an unknown specimen. Repeating this process several times will lodge a schemata of this species in our memories, leading to a third strategy:
- Recall. In practice, most of us probably have seen many of the common and even less-common species so often that we can by-pass these first two steps completely because we recognise the species without recourse to any books.
Disagreements, then, arise partly because we use different books as part of our naming process, our prior experiences differ and because our discipline in checking measurements of our own specimens against descriptions is not always as good as it should be. In many cases, especially with modern understanding of diatom species, boundaries between species are frequently being redrawn and descriptions of newer species can only be found in obscure journal articles, often behind paywalls, and knowledge of these often diffuses through the community of diatomists more slowly than it should. However, our discussions about the identity of the mystery Fragilaria also revealed a further issue, which I’ve illustrated in the graph below.
When we switch from “pattern recognition” to “probabilistic reasoning” we often base decisions on categoric distinctions of continuous variables such as length and width. In this case, the literature quotes a maximum width of four micrometres for F. austriaca, and this was an important factor contributing to decisions about the correct identity. However, there were differences in our measurements which means that some decided that the population was too broad to qualify as F. austriaca whilst others decided that it fell within the correct range. The likelihood, based on these graphs, is that at least some of us were making incorrect measurements but, at this stage, we don’t know who they are.
Measurements of width, stria density and length of the population of “Fragilaria austriaca”. Six analysts were involved in total, using either the eyepiece (“E”), an image projected onto a screen (“S”) or a measuring program (“P”) to make measurements (some used more than one approach). The dashed lines show the upper and lower limits for each parameter.
But that, itself, brings me to another point: do we know the correct size range of Fragilaria austriaca? In order to be sure, we would need measurements of both initial cells (the largest in a cell cycle) and cells at the point where they are about to undergo sexual reproduction (the smallest in the cell cycle), ideally from several populations. As this is rarely the case, we actually have three problems: first, is the description reliable? Second, are your measurements accurate? Third, we are using a point on a continuous scale as a criterion for a categorical judgement which implies perfect knowledge of the size range of the target population. Even if you are sure of your microscope’s calibration, the best you can say is that the largest valve that you saw in the sub-sub-sub-sub-sub-sub sample of the population that lived in the stream you sampled exceeds (or not) the largest valve that the original author measured in the sub-sub-sub-sub-sample that s/he examined. Several of our measurements just tip over four micrometres, the maximum width quoted in the literature for Fragilaria austriaca but, given these other factors, is that enough to drive a decision? Statisticians are more comfortable predicting means, modes and medians than predicting extreme values. Taxonomists, by contrast, seem to have undue reverence for maxima and minima.
Molecular biologists are approaching similar questions with considerable vigour. The arrival of metabarcoding and high throughput sequencing means that they have had to write complicated computer code (“bioinformatics pipelines”) to sort the millions of sequences that emerge from sequencers, matching as many as possible to sequences from organisms whose names we already know, in order to turn those sequences into data that biologists can use (see “When a picture is worth a thousand base pairs …”). We are conscious that decisions about software and settings within packages contribute to variations in the final output for reasons that we cannot always answer to our satisfaction. But, whilst engaged in these discussions about cutting-edge technology, I’m conscious that old-school biologists such as myself each perform our own private “bioinformatics” every time we try to name an organism and we don’t always agree on the outputs from these thought processes. Molecular biology, in a roundabout way, holds up a mirror to the way that we’ve been used to operating and should make us ask hard questions.
Some other highlights from this week:
Wrote this whilst listening to: my elderly vinyl copy of Mike Oldfield’s Tubular Bells
Cultural highlights: Milton Jones at Newcastle City Hall
Currently reading: Hilary Mantel’s The Mirror and The Light
Culinary highlight: polenta served with a mushroom and cheese sauce.
Finally, breaking news: I’m going to be live at the Green Man festival this August. More details of our event “Slime Time”, and all the other performers at Einstein’s Garden can be found here