Disagreeable distinctions …

When you look at an organism, how do you know what it is?   That’s a big question that hovers over many of the posts that I write.   I tell you the names of organisms and you believe me. Sometimes I do too.   The truth is that we take the way that our brains process the constant stream of signals that our eyes send us as we observe the natural world without a second thought.   The subject intrigues me, but I only manage to scratch the surface in the posts that I write (see “Abstracting from reality …” and “Do we see through a microscope?” for some of these speculations).

The plate below offers a case study in this process.  It shows a diatom we encountered in a recent ring test, and which most us agreed was either Fragilaria austriaca or something quite similar.   In binary terms, though, we have to be blunt: either it is Fragilaria austriaca or it is not which may have implications for subsequent recording and interpretation (see “All exact science is dominated by the idea of approximation”).   How come a group of experienced analysts can look at the same population of diatoms and reach different conclusions?   I’ve got two suggestions: the first is that we differ in how we process the images, and the second is that there are sources of systematic error which confound our attempts to seek the right answer.


Fragilaria austriaca” from Foreshield Burn, Cumbria, May 2019.

There are three basic strategies that we use to name an organism:

  • Probabilistic reasoning, through the use of keys which, in theory at least, have a logical structure that guides a user to the correct identity of an unknown specimen. In practice, this is not quite as straightforward as it sounds (see “Empathy with the ignorant …”) and, at some point, many of us will abandon the formal structure of a key and switch to …
  • Pattern recognition, which amounts to flicking through images until we find one that matches our specimen. We can then corroborate this preliminary match by checking the written description.  In practice, we will probably switch from probabilistic reasoning to pattern recognition and back again as we home in on the identity of an unknown specimen. Repeating this process several times will lodge a schemata of this species in our memories, leading to a third strategy:
  • Recall. In practice, most of us probably have seen many of the common and even less-common species so often that we can by-pass these first two steps completely because we recognise the species without recourse to any books.

Disagreements, then, arise partly because we use different books as part of our naming process, our prior experiences differ and because our discipline in checking measurements of our own specimens against descriptions is not always as good as it should be.   In many cases, especially with modern understanding of diatom species, boundaries between species are frequently being redrawn and descriptions of newer species can only be found in obscure journal articles, often behind paywalls, and knowledge of these often diffuses through the community of diatomists more slowly than it should.   However, our discussions about the identity of the mystery Fragilaria also revealed a further issue, which I’ve illustrated in the graph below.

When we switch from “pattern recognition” to “probabilistic reasoning” we often base decisions on categoric distinctions of continuous variables such as length and width.  In this case, the literature quotes a maximum width of four micrometres for F. austriaca, and this was an important factor contributing to decisions about the correct identity.  However, there were differences in our measurements which means that some decided that the population was too broad to qualify as F. austriaca whilst others decided that it fell within the correct range.   The likelihood, based on these graphs, is that at least some of us were making incorrect measurements but, at this stage, we don’t know who they are.


Measurements of width, stria density and length of the population of “Fragilaria austriaca”.  Six analysts were involved in total, using either the eyepiece (“E”), an image projected onto a screen (“S”) or a measuring program (“P”) to make measurements (some used more than one approach). The dashed lines show the upper and lower limits for each parameter.

But that, itself, brings me to another point: do we know the correct size range of Fragilaria austriaca?  In order to be sure, we would need measurements of both initial cells (the largest in a cell cycle) and cells at the point where they are about to undergo sexual reproduction (the smallest in the cell cycle), ideally from several populations.  As this is rarely the case, we actually have three problems: first, is the description reliable? Second, are your measurements accurate? Third, we are using a point on a continuous scale as a criterion for a categorical judgement which implies perfect knowledge of the size range of the target population.  Even if you are sure of your microscope’s calibration, the best you can say is that the largest valve that you saw in the sub-sub-sub-sub-sub-sub sample of the population that lived in the stream you sampled exceeds (or not) the largest valve that the original author measured in the sub-sub-sub-sub-sample that s/he examined.   Several of our measurements just tip over four micrometres, the maximum width quoted in the literature for Fragilaria austriaca but, given these other factors, is that enough to drive a decision?   Statisticians are more comfortable predicting means, modes and medians than predicting extreme values.   Taxonomists, by contrast, seem to have undue reverence for maxima and minima.

Molecular biologists are approaching similar questions with considerable vigour.   The arrival of metabarcoding and high throughput sequencing means that they have had to write complicated computer code (“bioinformatics pipelines”) to sort the millions of sequences that emerge from sequencers, matching as many as possible to sequences from organisms whose names we already know, in order to turn those sequences into data that biologists can use (see “When a picture is worth a thousand base pairs …”).   We are conscious that decisions about software and settings within packages contribute to variations in the final output for reasons that we cannot always answer to our satisfaction.  But, whilst engaged in these discussions about cutting-edge technology, I’m conscious that old-school biologists such as myself each perform our own private “bioinformatics” every time we try to name an organism and we don’t always agree on the outputs from these thought processes.   Molecular biology, in a roundabout way, holds up a mirror to the way that we’ve been used to operating and should make us ask hard questions.

Some other highlights from this week:

Wrote this whilst listening to: my elderly vinyl copy of Mike Oldfield’s Tubular Bells

Cultural highlights:  Milton Jones at Newcastle City Hall

Currently reading:  Hilary Mantel’s The Mirror and The Light

Culinary highlight: polenta served with a mushroom and cheese sauce.

Finally, breaking news: I’m going to be live at the Green Man festival this August.  More details of our event “Slime Time”, and all the other performers at Einstein’s Garden can be found here


Empathy with the ignorant …

One of my ongoing projects is to produce a “beginner’s guide” to freshwater diatoms.  I wrote one about twenty years ago but the taxonomy is now seriously out of date.   I also illustrated it with line drawings rather than photographs.  Strange to think now but this was the last years of film photography and the quality of photomicrograph that we now take for granted was a lot harder to achieve.   Nonetheless, I like to think that it played a useful role, if not as the definitive guide to the diatoms of the UK then as a “phrase book” that helps beginners understand the foreign language that hardcore diatomists talk.

Most of us refer to identification guides, informally, as “keys” yet, paradoxically, experienced biologists often skip the keys and flick through the pictures, using pattern recognition in preference to probabilistic reasoning to name specimens.  There is a reason for this: most keys are not very good.  I’ll go further: by the time a biologist has enough experience of a group of organisms to write a key, s/he has forgotten what it is like to be a beginner.   What is missing from most keys is empathy with the ignorant.

Let’s take as an example, an early couplet in the key in Freshwater Diatoms of Central Europe which asks if the cells you are looking at have internal septa.   Septa are thin, silica plates which project into the cell space (see figure a. below) and are useful diagnostic characters for some araphid diatoms, in particular.   Most keys to freshwater diatoms would have a couplet such as this at an early stage.   However, using septa as a diagnostic characteristic carries some disadvantages because they are part of the structure of girdle bands not the valve.   In the case of Tabellaria, the most common septa-bearing genus in freshwater (see image below), the cells often disintegrate during slide preparation so that our struggling beginner is faced with a few valves (b.) and a larger number of girdle bands (c. and d.).   Most of the features that are useful for identification are on the valve itself but here, just to spice up the beginner’s experience, the septa are on the girdle bands.   So the beginner has to detect a thin plate of silica (virtually a flat piece of glass) that is mounted between a slide and a coverslip (two more flat pieces of glass) in order to progress to the next step which asks about characteristics on an entirely separate structure.

Tabellaria flocculosa from the River Broom in north-west Scotland with septa indicated by arrows.   a. whole frustule in girdle view; b. valve; c. and d. girdle bands.  Scale bar: 10 micrometres (= 100th of a millimetre).

Just to make matters more confusing, some diatoms (Rhoicosphenia and some Gomphonema, for example) have “pseudosepta”, which are similar to septa but are part of the valve itself rather than the girdle band.  This should not be a problem when using keys because questions about septa usually come after questions about whether a raphe is present or not.  That should have steered our beginners away from Gomphonema and Rhoicosphenia except that one valve of Rhoicosphenia has a very short raphe that a beginner might well have missed.

Rhoicosphenia abbreviata from the River Derwent, north-east England with pseudosepta indicated by arrows.   a. and c. inner (concave) valve; b. outer (convex) valve (pseudosepta present but not in focus); d. whole frustule in girdle view.  Scale bar: 10 micrometres (= 100th of a millimetre).   Photos: Ingrid Jüttner.

Whilst, in theory, the logical structure of a key should take the user infallibly to the right taxon, in practice, users tend to use the key only until the point where they encounter a couplet that cannot be easily tackled.  At this point, they switch from probabilistic reasoning to pattern recognition – they flick through the images, in other words, until they find one that matches the specimen that they are trying to identify.   Then they use the descriptions to confirm (or not) their hunch.   The key may fail because the writer assumed too much knowledge on the part of the user, because the specimen is not “typical” (that’s for another post!) or the user’s equipment is not as good as the writer expected.   I suspect that the first of these is most often the case, because the experts who write the keys have, quite simply, forgotten what it is like to be a beginner.

I do use keys a lot when teaching because I think that the repetition of a series of (more-or-less) logical steps drives home the elements of diatom morphology that beginners need in order to put names on different genera and species.  Once they have got these basics in their heads, then I am happy for them to switch to pattern recognition rather than probabilistic reasoning.   What I suspect happens is that schemata of most of the genera get lodged in their memory, and they can then use this information to find the right set of images from which to match their unknown specimen.   The key is an important part of this process so effort put into writing a really good key should be worthwhile.

The problem lies in understanding what we mean by “really good”.   There is a risk that we define the quality of identification guides in terms of taxonomy whereas didactics plays an equally important role and we cannot assume that someone who is an expert on the former is as knowledgeable about the latter.   The Field Studies Council AIDGAP guides set a fine example by insisting on end-user testing to ensure the usability of keys, but these principles have not filtered through to the wider academic community.   Remembering what it was like to be a complete beginner is a good start.

That’s laid out the theory. Next job is to turn this into practice ….


More about the development of keys to identify diatoms in “The decline and fall of a CD-ROM

I also contributed to a European standard which set out the principles behind writing good keys for applied ecology:

EN 16164:2013.  Water quality – Guidance standard for designing and selecting taxonomic keys.   Comité Europeen  de Normalisation (CEN), Geneva,  12pp.

Tales of Hofmann …


For the past five years or so the constant companion on my desk whilst I stare down my microscope has been a thick tome (2.8 kg) entitled Diatomeen im Süßwasser-Benthos von Mitteleuropa by Gabi Hofmann and colleagues.  It serves as my aide-mémoire when I am analysing freshwater diatoms, jogging my memory when I see a diatom that I recognise but whose name I have forgotten.  Before this was published, I used a French publication Guide Méthodologique pour la mise en oeuvre de l’Indice Biologique Diatomées which was free to download (I cannot find a link on the web any longer, unfortunately).   Neither of these is the last word in diatom taxonomy, but that was not the point: a lot of the time, I just need a gentle reminder of the right name for the species I am looking at, and I don’t want to have to pore through a pile of books in order to find this.

One of the strong points of both books is that they are copiously illustrated, and the plates are arranged very logically so that similar-shaped diatoms are together, making it easy to pick out differences.   For most routine identification, this is exactly what is needed: we may pretend that we are logical people but, in truth, pattern matching beats using a key nine times out of ten.   The 133 plates in Diatomeen im Süßwasser … act as a visual index and, to make life even easier, the species descriptions are arranged alphabetically and cross-referenced in the plates.  Having found an image that resembles the diatom I am trying to identify, it is straightforward to flick to the description to check the details.

There is just one problem: Diatomeen im Süßwasser-Benthos von Mitteleuropa is in German, and quite technical German at that.   I tell people not to worry because all the images are in English but, in truth, I worry that I may lose some of the nuances due to my linguistic limitations.   I was delighted, then, to be asked by Marco Cantonati to help produce an English version of the book.  Marco is half-German so reads and speaks the language fluently, and I was able to work on his first drafts in English to produce the final text.   Conscious that translating a German book into English is only a partial solution for the almost 70% of the EU who have neither as their first language, we also unpicked the prose in order to put the information about each species into a series of “bullet points” so that it was more accessible and we also took the opportunity to update some of the taxonomy.   A large part of last weekend was spent poring over the proofs so it should not be long now before it is available to purchase.

The great irony for me is that I am putting the finishing touches to this book at the same time as I am helping the Environment Agency to move away from using the light microscope to identify diatoms altogether.   I am just finalising the last of the regular competency tests that I organise in which, Environment Agency staff will participate, after which routine samples will be sent off for Next Generation Sequencing rather than being analysed by light microscope.  I’ve written about the pros and cons of this before (see “Primed for the unexpected …”) but there is a funny side.   After over a decade of struggling with identification literature in a language that almost none of them spoke my dedicated band of Environment Agency analysts get the book they dreamed about two months after their last diatom slide is packed away.   My sense of timing is, as ever, impeccable …

Hofmann, G., Werum, M. & Lange-Bertalot, H. (2011).   Diatomeen im Süßwasser-Benthos von Mitteleuropa. A.R.G. Gantner Verlag K.G., Rugell.

Prygiel, J.  & Coste, M. (2000).   Guide Méthodologique pour la mise en oeuvre de l’Indice Biologique Diatomées.   NF T 90-354.  Cemagref, Bordeaux.

Tidings of Great Joy …


About three years ago I was one of a small group of people who met in the bowels of the National Museum of Wales in Cardiff to discuss the options for producing an online guide to the freshwater diatoms of Britain and Europe.   There were, we reasoned, good guides to most of the rest of the algae found on these islands, and plenty of guides to the diatoms of continental Europe.  There was also an active community of people interested in diatoms for a variety of reasons, both professional and recreational.   There had also been well-intentioned initiatives in the past, the most recent of which was a CD-ROM that I helped to produce for the Environment Agency a few years ago.  I wrote about that sorry saga in “The decline and fall of a CD-ROM”.

There are good reasons why it has not been possible to produce a good guide to diatoms in the past, not least of which has been the shifting sands of diatom taxonomy, which creates instability for anyone who is trying to collate information on the present state of play or, for that matter, to put names on the myriad forms of diatoms that one sees when peering down a microscope.   A more practical reason, over the last few years, has been the absence of anyone with the time and resources to mastermind a project but that situation was about to change, thanks to the National Museum of Wales, who gave their diatom curator Ingrid Jüttner time to work on the project.   They also had experience of developing online taxonomic aids, and a basic “shell” for a website that could be adapted to our needs.   The missing link was funding to allow others with practical or academic interests in diatom taxonomy to travel and meet up to support the project.  That problem was solved thanks to generous support from the British Phycological Society.


The homepage for the genus Nitzschia in the Diatom Flora of Britain and Ireland.

And so, today, the Freshwater Diatom Flora of Britain and Ireland was launched on the National Museum of Wales website and I encourage you all to have a look.   Comparing the swish tablet-friendly website to our CD-ROM is a salutary experience.   That had to be run from a computer with a CD-ROM drive, which meant that either your microscope had to be close to your desktop or you had a laptop or you were constantly dashing across the laboratory to compare the image on the screen with the specimen under your microscope.  There was, at the time, an edict within the Environment Agency that prohibited desktop computers from laboratories, which further complicated the issue.  Now you can check specimens on an iPad as easily as consult a paper flora.  And that is quite important because, in my experience, there are three levels of biological identification.  First, there is a basic pool of species that you can name from memory, then there are rare and difficult specimens that cannot be identified easily and which require you to consult the literature.   Finally, there is a group of species that fall between these two categories that you recognise but for which you may need a “nudge” to match them to the right name.   For these, an aide mémoire that you can consult easily is invaluable.  I always felt that the Lucid software that drove the CD-ROM was a little too clunky to serve this purpose, but a website accessible via a tablet might approach the functionality of my paper-based identification aids.

The diatom flora has images and descriptions of most freshwater genera, and of the most commonly encountered species.  But there is still a long way to go.   The next couple of years will see us start filling in some of the gaps in order to improve coverage, both in the number of species and the amount of information about each.   At the moment, the focus is on valve morphology, but there is more that could be written about live diatoms and about their ecology, for a start.  But we have made the first steps and, importantly for this modern age, we have burst the old paradigm that regards taxonomic literature as stolid inflexible overviews of the state of the art at a point in time, and emerged blinking into a new era where the medium is flexible enough to accommodate change and evolve as our understanding improves.


The webpage for Nitzschia dissipata in the Diatom Flora of Britain and Ireland, with the description on the left and images on the right.

The decline and fall of a CD-ROM

I’ve just looked back at a paper I wrote for a symposium in 2004. It described a CD-ROM I was helping to develop at the time for the UK’s Environment Agency, to help their analysts to identify diatoms.   The tone of the article is upbeat and positive, eulogising the potential for interactive CD-ROMs for identification.

So much for that.   The Environment Agency has just deleted our key from their publications catalogue.   In so doing they fulfil one of the prophecies in my paper.   The software, quite simply, evolved faster than we were able to respond.   We had a bold vision for a modular project, developing from a first release with about 300 common diatoms found in rivers into, eventually, comprehensive coverage of all diatoms found in Britain and Ireland. We had recognised, too, that the software would have to evolve to keep track of other developments in hardware and software. However, within about a year of the release of the CD-ROM, the Environment Agency’s priorities shifted and (more significantly) funding became much tighter.   Funding for the additional modules never happened.   More importantly, there have been changes in the licensing agreement for the software that we used to develop the key which means that our package would need to be modified if it were to be sold as a user-friendly entity again.   The publications team at the Environment Agency did not consult us before deleting the CD-ROM but, even if they had done, I doubt that there would have been funding available to cover the time needed to upgrade the package.

This saga illustrates some of the pitfalls of using new media. I have, on my bookshelf, a facsimile edition of Frederich Hustedt’s flora of freshwater diatoms from central Europe, first published in 1930. Many of the taxonomic ideas are now out of date but the illustrations and descriptions are still useful, 86 years after firs publication. By contrast, our CD-ROM was obsolete within a decade.   The whole idea of a CD-ROM, indeed, sounds rather quaint in 2014 and, to emphasise the point, I am typing this post on a laptop which does not have a CD-ROM drive.   The latest versions of the Lucid software are aimed at online keys and the prospect of using an iPad or even a mobile phone as a platform for identifying organisms is tantalising (and ,indeed, already possible for some groups: see the Field Studies Council publication website).

The same issues about upgrading and maintenance will apply as much to a website as to a CD-ROM. As soon as I stop paying my subscription, my own websites will disappear, along with all the information stored on them.   Large institutions such as the Environment Agency and national museums ought to be more resilient but I fear that it would only take a small shift in an organisation’s priorities or a change in key personnel for an active website to become fossilised or archived.

The good news is that all of the information except the keys themselves are available on the web, courtesy of Steve Juggin’s website at Newcastle University and that there are plans to develop a new online diatom flora of Britain and Ireland, hosted by the National Museum of Wales.  And, of course, if all these fail, I will still have my trusty copy of Hustedt’s Flora.


One aspect of the CD-ROM that will not be missed: the grim cover of the CD-ROM. Not a single diatom in sight.


Kelly, M.G., Bennion, H., Cox, E.J., Goldsmith, B., Jamieson, B.J., Juggins, S., Mann, D.G. & Telford, R.J. (2006). An interactive CD-ROM for identifying freshwater diatoms. pp. 153-161. Proceedings of the 18th International Diatom Symposium (edited by A. Witkowski). Biopress, Bristol.