The other day I went looking for all the open source data on Covid-19 in the UK so that I could do some modelling on when the peak was likely to be reached and, by inference, when we might hope to be able to resume normal activities.

When Will Lockdown End?

I’ll start with this, and you can ignore the caveats on why it’s nonsense (which is what always happens with mathematic models). If the assumptions are accurate (I can guarantee that they’re not), then by the end of July about 3 million people will have been infected and the first peak will have run its course. (See the graph below). If we’re lucky August will be party season!

Daily confirmed (i.e. hospitalised) cases of Covid-19 and expected deaths over the next few months. Note that this is non-cumulative, but the daily totals. (graph: James Kemp)

caveats

  • The data is incomplete, we’re only testing the sickest people who turn up in hospital, I’m not even certain that we’re testing everyone I’m hospital that might have Covid-19.
  • There’s a massive time lag in the data we do have, people reported as testing positive today we’re probably infected over two weeks ago.
  • There are obvious reporting anomalies in the data, you can see the weekends, so this makes it harder to identify trends and extrapolate.
  • The open source data available only really tells us what happened in March.
  • The margin of error is very large.

Building a Covid-19 Model

I’ve built lots of models for a variety of purposes. This one followed a method I’ve worked out from experience.

  1. Get an understanding of the system you want to model
  2. Work out what the outputs you need are
  3. Collect appropriate input data
  4. Look for relationships in the data
  5. Build a model that uses relationships to show how inputs deliver outputs
  6. Test the model against reality
  7. Iterate

I built the model in a spreadsheet because that was the tool I had available on my personal machine. Discrete event simulation software might have been a better approach, but I don’t have that available right now.

Can You Build a Better Covid-19 Model?

In the interest of transparency here’s the spreadsheet that I built the model in, with all the data already in it. It’s a rough version, without much in the way of notes, but you can see the working if you are interested in this sort of thing. It’s not a work of beauty, I pulled it together in a few hours on Good Friday.

Sources

Assumptions

Some assumptions came out of the following articles.

Assumption Value Unit Source
Incubation period (mean) 4 days https://www.nejm.org/doi/full/10.1056/NEJMoa2002032
hospitalisation lag after symptoms 10 days Unknown, range 7-10 days, couldn’t find peer-reviewed article.
Test result reporting lag (mean) 3 days based on performance in NE Surrey, also reporting patterns
average time in hospital 13 days https://www.nejm.org/doi/full/10.1056/NEJMoa2002032
Proportion needing hospitalisation 18.6% of total cases https://jamanetwork.com/journals/jama/fullarticle/2762130
Proportion needing critical care 4.7% of total cases https://jamanetwork.com/journals/jama/fullarticle/2762130
Proportion needing a ventilator 66.00% of hospitalised https://www.bmj.com/content/368/bmj.m1201
Median time in critical care 5 days https://www.bmj.com/content/368/bmj.m1201
Time to conclusion (death or recovery) 23 days