Last week saw a small career achievement as I sent out the result of the 50th diatom ring-test that I organise for the diatom analysts in the UK and Ireland. “Ring-test” is the informal term for an inter-laboratory comparison, when two or more laboratories analyse the same sample and compare their results. We started out doing regular ring-tests in 2007 for all the people who were analysing diatom samples for assessments associated with the Water Framework Directive, sending out five slides each year to staff in the UK and Irish environment agencies and contractors who worked with them. Now, a decade later, the scheme is still going strong, with participants from Germany, Sweden and Estonia joining the British and Irish contingents.
There are a number of similar schemes around Europe with the same basic model: the organiser sends out copies of a slide made from the same sample, all participants then analyse the slide and send in their results, which the organiser collates. There is usually one or more designated “expert” against whose results everyone else is judged. Most of the other schemes then organise a workshop at which participants gather to discuss the finer points of diatom taxonomy. We have had workshops in the past, but these are not directly linked to the ring-tests. Instead, we send out a report that summarises results and provides notes on the identification of difficult or unusual taxa. The money we save on workshops means that we can circulate more slides. I’m a great believer in “little and often” for this type of quality control.
A second feature of our scheme (which some of the other European schemes have also now adopted) is to use a panel of experienced analysts to provide the benchmark that other participants should achieve. This means that we have an idea of both the average result and the scale of the variation associated with this. We learned early on that some samples gave much less variable results than others, even when the analyses were performed by experienced analysts. We use this knowledge to adjust the size of the “target” that participants must achieve. The graphs below show the results for our most recent test. The horizontal blue lines on the left hand graph show two standard deviations around the mean of the “expert” analyses (expressed as TDI). This is the “warning limit”; if an analyst exceeds this then he or she should be looking at their results to see if they have made any mistakes. The red line is the “action limit”, seven TDI units either side of the expert mean. We know from other studies (see lower graph, left) that it is very unlikely that two replicate analyses have a greater difference than this, so analysts who exceed this should definitely be checking their results.
The results of the 50th UK / Ireland diatom ring test showing (left) difference in TDI and (right) number of taxa (N taxa) between experts and other participants. Blue lines: mean TDI ± two standard deviations of expert panel’s mean; red lines: mean TDI ± 7. Note that it is unusual for the between-analyst variability to be quite as narrow as it was for this slide.
The reason why we need flexible “warning limits” is illustrated in the right hand graph below. This shows the similarity between two counts as a function of the diversity of the samples. The relationship has a wedge-shape (illustrated by the blue line – the regression line through the 90th percentile of the data). There are a number of reasons why two analysts are unlikely to get identical results, one of which is that they disagree on the identities of the taxa that they encounter (the reason why we are doing the audits in the first place). But what a wedge-shaped relationship is also telling us is that there seems to be an upper limit to the similarity that can be achieved at any given diversity. This is an inherent stochastic quality of the data and has nothing to do with the competence of the analysts.
Left: some of the data from which the “action limit” for the ring-tests was established. These are the results of audits of 67 samples from Northern Ireland in which the original (“primary”) analysis was checked against the result of an independent (“audit”) analysis. Right: The effect of diversity on the similarity between primary and audit analyses for the same dataset.
A further way in which our scheme differs from others is that no-one “passes” or “fails”. That might seem counter-intuitive as this is supposed to be a test of competency. A regular reader of this blog, however, should understand that there absolute truth is often elusive when it comes to identifying diatoms and other algae. The hard objectivity needed for a real test of competency always has to be moderated by the recognition of the limitations of our craft. Moreover, turning this exercise into a calibration exercise runs the risk of turning the analysts into machines. Rather, we use the term “reflective learning”, encouraging participants to use the reports to judge their own performance relative to the experts, and to take their own corrective action.
Some of the organisations whose analysts participate use the ring-test as part of their own quality control systems, and will take corrective action if results stray across the action limit. That seems to be a sensible compromise: quality control should be the responsibility of individual laboratories, rather than delegated out to third parties. At the same time, organisations need to understand that the people who perform ecological analyses are professionals, not treated as if they are one more machine in a laboratory that needs to be calibrated.
If you are interested in joining the UK / Diatom ring test scheme, or just want to learn a little more about it, get in touch with me and I’ll do my best to answer your questions.