Missed 16,000 Covid cases thanks to an Excel error? You need a data literacy lesson
As has been widely reported, a technical glitch with the NHS Test and Trace system last week meant that almost 16,000 positive Covid-19 cases were not recorded in the official data.
The issue lay with Public Health England’s use of Microsoft Excel to transfer lab results into its reporting dashboards. The results were stored in an older format, with a limit on the number of rows and the number of columns per spreadsheet. Anything over that limit simply wasn’t counted.
This technical limitation led to missing data and erroneous data processing. Without a full set of results, PHE had no access to the correct number of positive results and was unable to send out the relevant contact-tracing alerts.
The Test and Trace system is undoubtedly a powerful weapon in preventing the spread of coronavirus and it’s a timely example of the value of collecting and analysing data. But this situation highlights just how important it is for governments and big companies to ensure that they have adequate infrastructure and the skills in place to handle data assets — especially on these large scale projects that have a direct impact on people’s lives.
It’s important to recognise that, while the technical fault in this case lay in the use of Excel, the issue here is not with that program. Rather, what this fiasco demonstrates is the urgent need for data literacy and understanding the impact of technology choices across all levels of governments and organisations. One poor decision to export the database to an Excel sheet had a knock-on effect that ultimately resulted in the incident of these cases not being counted.
This issue could have been avoided with better data engineering at the top of the pipeline — in other words, the use of a system that didn’t require the data to be exported to Excel to be analysed. Excel is a tool that was built for quick tabular insights, useful for individuals and small businesses, not for collaborative complex querying at the scale at which the Test and Trace system operates. It was simply not the right tool for the job.
However, the error would have also been avoided if those using the data had known the limitations of the Excel format and the underlying data pipeline and been able to mitigate the risks. Clearly, they didn’t.
It’s a lesson that everyone involved in projects of this magnitude needs to be educated in data. Whether it’s basic understanding of formats, knowledge of how data is stored for analysts, or advanced engineering for software related tasks, the people working on these projects need to speak the same language. In turn this will allow them to communicate and understand tradeoffs and implications of anything they do on the data to others later in the pipeline. If they are not, what might seem like a meaningless error to one person leads to massive incidents such as the Test and Trace failure, which damages the credibility of the system itself.
The more we rely on data science to help resolve the huge societal challenges we are facing — tracking the disease, provisioning resources, determining exam results — the more serious the consequences are if they are implemented incorrectly. Time for a data literacy crash-course — for everyone.
Main image credit: Getty