As if through a glass darkly …

Life used to be so easy: I stared down my microscope, named the diatoms I could see, counted them and, from these data, made an evaluation of the quality of the ecosystem that I was studying.   Along with the majority of my fellow diatomists, I conveniently ignored the fact that I was looking at dead cell walls rather than living organisms.   My work on molecular barcodes as an alternative to traditional microscopy has been revelatory as I try to reconcile these two types of data.   At one level, what I see down the microscope is a benchmark for what I should expect to see in my barcode output.  Yet, at the same time, the differences between the two types of data show up the limitations of traditional data – and the assumptions that underpin the ways that we work.

Take a look at the plate below which shows two of the most common diatoms in UK rivers: Ulnaria ulna is one of the largest that I encounter regularly whilst Achnanthidium minutissimum is often one of the most abundant in my samples, particularly when the level of human pressure is relatively low.  When we analyse samples with the light microscope, we record individuals, so both of these score “1” in my data book despite the fact that U. ulna is about 100x larger (by volume) than A. minutissimum.

Specimens of Ulnaria ulna (top) and Achnanthidium minutissimum (bottom).  Both are from cultures used for obtaining sequences for the reference library for our molecular barcoding project.   Scale bar: 10 µm.   Photographs: Shinya Sato, Royal Botanic Gardens, Edinburgh.

When we analyse a sample using Next Generation Sequencing (NGS), we count not cell walls but copies of the rbcL gene, which provides the blueprint for Rubisco, a key photosynthetic enzyme.   As I write, there is no clear understanding of how the number of rbcL copies relates to the number of individuals.  We know that each chloroplast within a cell will have at least one copy of this gene, and usually several. There is also some evidence that larger chloroplasts have more copies of the gene than smaller ones and there is also likely to be a measure of environmental control.  The key message that I try to get across in my talks is that NGS data are different to the data we are used to gathering using microscopy.  These differences do not mean that it is wrong, just that we need to leave some of our preconceptions before starting to interpret this new type of data.

However, we could also argue that counting the number of copies of the gene for an important photosynthesis enzyme should be giving us a better insight into the contribution of a species to primary productivity than counting the number of cell walls.  In other words (whisper this …), rbcL might not just be different, it might be better, especially if our purpose is to understand the contribution the various species in the biofilm make to primary productivity in stream ecosystem.  At the moment there are plenty of problems with the NGS-based method, not least the fact that we often cannot assign half the copies of the rbcL gene in a sample to a species, but the situation is improving all the time …

Some recent work pushes this a little further.   Jodi Young and colleagues at Princeton University have demonstrated large variation in the kinetics of Rubisco in diatoms, and in their carbon-concentrating mechanisms (see “Concentrating on carbon …” for more about these).  Although their work is focussed on marine phytoplankton, the variation within Rubisco and carbonic anhydrases could go some way to explaining the sensitivity of diatoms to inorganic carbon (see “Ecology in the Hard Rock Café …”).   In other words, rbcL is not an irrelevant DNA sequence, as the term “barcode” may imply (in contrast to barcodes based on the ITS region, for example), it is deeply implicated in the reasons why a species lives in particular place.

And yet, and yet, and yet …  The same could be argued for morphology, up to a point at least.   The shape of a Gomphonema or a Navicula also helps us to understand the organism’s relationship with its environment.   The problem is that modern taxonomists tend to focus on a much finer level of detail – on the arrangement and structure of the various pores on the silica frustule, for example – and offer few insights into what these minute differences mean in terms of the ecophysiology of the organisms.  Even at the whole-cell scale, information on habit, which is linked to form (Gomphonema tending to live on stalks or short mucilage pads secreted from their foot poles for at least part of their life-cycle, for example) is rarely incorporated into assessment systems.   The move from using light microscopy to using NGS, in other words, means replacing an imperfect system with which we are familiar with one that we are still learning to understand.  Both offer unique information and the gains from using one approach rather than the other, will be offset by losses of insight.

That leaves us with two big challenges over the couple of years, as UK diatom-based assessments move from light microscopy to NGS.  The first is to work harder to understand what NGS outputs are actually telling us about the environment over and above the minimalist ecological status indices that spew out of our “black box” computer programs.   The second is to maintain an understanding of the properties of whole organisms and how these interact with one another and with their environments.   I guess I should add a third challenge to this pair: persuading middle managers who have at best a sketchy understanding of diatoms and phytobenthos and already-stretched budgets that any of this matters …

References

Badger, M.R. & Price, G.D. (2003).  The role of carbonic anhydrase in photosynthesis.  Annual Review of Plant Biology 45: 369-392.

Young, J.N. & Hopkinson, B.M.M. (2017).  The potential for co-evolution of CO2-concentrating mechanisms and Rubisco in diatoms.  Journal of Experimental Botany doi: 10.1093/jxb/erx130.

Young, J.N., Heureux, A.M.C., Sharwood, R.E., Rickaby, R.E.M., Morel, F.M.M. & Whitney, S.M. (2016).  Large variations in the Rubisco kinetics of diatoms reveals diversity among their carbon-concentrating mechanisms.  Journal of Experimental Botany 67: 3445-3456.

Advertisements

Ecology’s Brave New World …

My travels have brought me to the kick-off conference of DNAqua-net at the University of Duisburg-Essen in Germany, to give a plenary talk on our progress towards using high throughput next generation sequencing (NGS) for ecological assessment.   I went into the meeting feeling rather nervous as I have never given a full length talk to an audience of molecular ecologists before but it was clear, even before I stood up, that we were in the almost unique position of having a working prototype that was under active consideration by our regulatory bodies.   Lots of the earlier speakers showed promising methods but few had reached the stage where adoption for nationwide implementation was a possibility.   There was, as a result, audible intake of breath as I mentioned, during my talk, that, from 2017, samples would no longer be analysed by light microscopy but only by NGS.

That, in turn, brought some earlier comments by Florian Leese, DNAqua-net chair, into sharp focus.  He had talked about managing the transition from “traditional” ecology to the Brave New World of molecular techniques; something that weighs heavily on my mind at the moment.   In fact, I said, in my own talk, that the structures and the values of the organisations that were implementing NGS were as important as the quality of the underlying science.   And this, in turn, raised another question: what is an ecologist?

If that sounds too easy, try this: is an ecologist more than just someone who collects ecological data?   I have put the question like this because one likely scenario for routine use of environmental DNA, once in routine use, is that sampling will be delegated to lowly technicians who will dispatch batches to large laboratories equipped with the latest technology for DNA extraction, amplification and sequencing on an enormous scale (see “Replaced by a robot?”) and the results will be fed into computer programs that generate the answer to the question that is being posed.

The irony, for me, is that the leitmotif of my consultancy since I started has been helping organisations apply ecological methods consistently across the whole country so that the results generate represent real differences in the state of the environment and not variations in the practice or competence of the ecologists who collected the data.  Over the past decade, I helped co-ordinate the European Commission’s intercalibration exercise, which extended the horizons of this endeavour to the extremities of the European Union.   The whole process of generating ecological information had to be broken down into steps, each has been taken apart and examined and put back together to, we hoped, produce a more effective outcome.  There was, nonetheless, ample opportunity for the ecologist to bring higher cognitive skills to the process, in sampling and surveying, species identification and, ultimately, in interpreting the data.

I often use the example of McDonalds as a model for what we are trying to achieve, simply because it is a brand with which everyone is familiar and we all know that their products will taste the same wherever we go (see “Simplicity is the ultimate sophistication …“).   I admire them for that because they have achieved what ecologists involved in applying EU legislation should desire most: a completely consistent approach to a task across a territory.   But that same consistency means that one is never tempted to pop into a McDonalds on the off chance that the chef has popped down to the market to buy some seasonal vegetables with which to whip up a particularly appetising relish.   If you want the cook to have used his or her higher cognitive abilities to enhance your dining experience you do not go to a McDonalds.

But that is where we could end up as we go down the road of NGS.  A reader of my post “A new diatom record from West Sussex” commented tartly that there would be no chance of that diatom being spotted once the Environment Agency replaced their observant band of diatom analysts by NGS and he was right.   Another mentioned that he had recently passed on a suspicion of a toxic pollution event to the local staff based on observations on the sample that were not captured by the metrics that are used to classify ecological status.  Again, those insights will not be possible in our Brave New World.

Suppose we were somehow able to run a Monte-Carlo permutation test on all the possible scenarios of where we might be in twenty years, in terms of the application of NGS to ecological assessment.  Some of those outcomes will correspond to Donald Baird’s vision of “Biomonitoring 2.0” but some will not and here, for the sake of playing Devil’s Advocate, is a worst-case scenario:

In an effort to reduce costs, a hypothetical environmental regulator outsources eDNA sampling to a business service company such as Group 4 or Capita.   They batch the samples up and dispatch them to the high throughput laboratory that provides the lowest quote.   The sequencing results are uploaded straight to the Cloud and processed according to an automated “weight of evidence” template by data analysts working out of Shanghai, Beijing or Hyderabad before being passed back to staff in the UK.   At no point is a trained ecologist ever required to actually look at the river or stream.  I should stress that this “year zero” scenario will not come about because NGS is being used but because of how it is used (and a post in the near future will show how it is possible to use NGS to enhance our understanding of the UK’s biodiversity).   It brings us back to the question of the structure and values of the organisation.

What I would like to see is a system of ecological assessment that makes full use of the higher cognitive abilities of the biologists responsible for ecological assessment.  Until now a lot of a biologist’s skill goes into identifying organisms in order to make the list of species upon which assessments are based.  It should be possible to use the new genetic technologies to free ecologists to play a greater role in interpretation and decision-making.  However, that will not come about when they are being used in situations where there is an overwhelming desire to reduce costs.  One of the lessons that we need to learn, in other words, is that there is more to applying molecular ecology than simply developing the method itself.

Reference

Baird, D.J. & Hajibabaei, M. (2012). Biomonitoring 2.0: a new paradigm in ecosystem assessment made possible by next-generation DNA sequencing. Molecular Ecology 21: 2039-2044.Date

 

It’s just a box …

illumina_MiSeq_linocut

Today’s post starts with a linocut of an Illumina MiSeq Next Generation Sequencer (NGS), as part of an ongoing campaign to demystify these state-of-the-art £80,000 pound instruments. It’s just a box stuffed with clever electronics.   The problem is that tech-leaning biologists go misty-eyed at the very mention of NGS, and start to make outrageous claims for what it can do.   But how much are they actually going to change the way that we assess the state of the environment?   I approach this topic as an open-minded sceptic (see “Replaced by a robot?” and “Glass half full or glass half empty?” and other posts) but I have friends who know what buttons to press, and in what order. Thanks to them, enough of my samples have been converted into reams of NGS data for me now to be in a position to offer an opinion on their usefulness.

So here are three situations where I think that that NGS may offer advantages over “traditional” biology:

  1. reducing error / uncertainty when assessing variables with highly-contagious distributions.
    Many of the techniques under consideration measure “environmental DNA” (“eDNA”) in water samples. eDNA is DNA released into water from skin, faeces, mucus, urine and a host of other ways.   In theory, we no longer need to hunt for Great Crested Newts in ponds (a process with a high risk of “type 2 errors” – “false negatives”) but can take water samples and detect the presence of newts in the pond directly from these.  The same logic applies to lake fish, many of which move around the lake in shoals, which may be missed by sampler’s nets altogether or give false estimates of true abundance.   In both of these cases, the uncertainties in traditional methods can be reduced by increasing effort, but this comes at a cost, so methods based on eDNA show real potential (the Great Crested Newt method is already in use).
  2. Ensuring consistency when dealing with cryptic / semi-cryptic species
    I’ve written many posts about the problems associated with identifying diatoms.   We have ample evidence, now, that there are far more species than we thought 30 years ago. This, in turn, is challenging the ability to create consistent datasets when analysts spread around several different laboratories are trying to make fine distinctions between species based on a very diffuse literature.   Those of us who study diatoms now work at the very edge of what can be discriminated with the light microscope and the limited data we do now have from molecular studies suggests that there are sometimes genetic differences even when it is almost impossible to detect variation in morphology.   NGS has the potential for reducing the analytical error that results from these difficulties although, it is important to point out, many other factors (spatial and temporal) contribute to the overall variation between sites and, therefore, to our understanding of the effect of human pressures on diatom assemblages.
  3. Reducing costs
    This is one of the big benefits of NGS in the short term.   The reduction in cost is partly a result of the expenses associated with tackling the first two points by conventional means.   You can usually reduce uncertainty by increasing effort but, as resources are usually limited, this increase in effort means channelling funds that could be used more profitably elsewhere.   However, there will also be a straightforward time saving, because of the economies of scale that accompanies high-throughput NGS.   A single run of an Illumina MiSeq can process 96 samples in a few hours, whereas each would have required one to two hours for analysis by light microscope. Even when the costs of buying and maintaining the NGS machines are factored in, NGS still offers a potential cost saving over conventional methods.

It is worth asking whether these three scenarios – statistical, taxonomic and financial – really amount to better science, or whether NGS is just a more efficient means of applying the same principles (“name and count”) that underpins most ecological assessment at present.   From a manager’s perspective, less uncertainty and lower cost is a beguiling prospect.   NGS may, as a result, give greater confidence in decision making, according to the current rules. That may make for better regulation, but it does not really represent a paradigm shift in the underlying science.

The potential, nonetheless, is there. A better understanding of genetic diversity, for example, may make it easier to build emerging concepts such as ecological resilience into ecological assessment (see “Baffled by the benthos (2)” and “Making what is important measurable”). Once we have established NGS as a working method, maybe we can assess functional genes as well as just taxonomic composition?   The possibilities are endless.  The Biomonitoring 2.0 group is quick to make these claims.   But it is important to remember that, at this stage, they are no more than possibilities   So far, we are still learning to walk …

Reference

Baird, D.J. & Hajibabaei, M. (2012). Biomonitoring 2.0: a new paradigm in ecosystem assessment made possible by next-generation DNA sequencing. Molecular Ecology 21: 2039-2044.