Archive | April, 2020

Why I don’t believe that 2.5-4% of people in Santa Clara county have had COVID19

19 Apr

Originally posted on April 19th. Small edits on April 21st. Thanks to Scott Roy and Dmitri Petrov for comments. 

This week a study was published by researchers from Stanford about how many people in Santa Clara county have been infected with the new coronavirus SARSCoV2.

You may not have heard of Santa Clara county, but it’s the heart of Silicon Valley and its most famous residents are Stanford, Google, Apple and IBM. I lived there too for a few years when I was a postdoc at Stanford.

The researchers used a new test to detect antibodies against the virus. Antibody testing is going to be super important in the near future, but I have serious concerns about this study and its conclusions.

The main result from the paper is that they estimate that between 2.5 – 4% of people in Santa Clara have had COVID19. That would mean between 48,000 and 81,000 people. If this is correct, it would mean that the virus has infected many more people in Santa Clara than the official numbers suggest. (50-85 fold more).

If 50-85 times sounds hard to believe, that’s because it is. Even though most experts agree that the real number of infected people is higher than the reported numbers, 50-85 fold higher than reported would be quite crazy. In research, we like to say that “extraordinary claims require extraordinary evidence” (https://en.wikipedia.org/wiki/Sagan_standard). Here the claim is extraordinary but the evidence isn’t. Also, we learn that even if a study comes from a great university – this is no guarantee that the study is good.

2.5-4% seroprevalence is unlikely in Santa Clara county

Why is 2.5-4% positive in Santa Clara county an extraordinary claim? This is because in the European countries where seroprevalence is around 3%, many more people have died (relative to the size of the population) than in Santa Clara county. It would be very unlikely that the infection fatality rate (how likely you die when you catch the virus) is significantly lower in Santa Clara then in other parts of the world. For example, The Netherlands also reports a 3% seroprevalence, but has 5.5 times as many deaths per 100,000 people compared to Santa Clara county.

Two issues: biased sample and false positives

In my opinion, there are two main issues with this study. Both make that the Stanford researchers overestimate the number of people who were infected with SARS-CoV2.

One is that this was probably NOT a random sample. And two is that the false positive rate for this kind of test is high. This means that we don’t know if the people who had a positive test result have really been infected with the virus.

  1. Why is this not a random sample? 

They asked people to volunteer for this study using Facebook ads. Now, I think there is nothing wrong -in principle- with using Facebook ads to recruit people. But I do think that people who have been sick with a fever and cough recently are more likely to volunteer for this study to test whether they’ve had COVID19!

If people who actually had COVID19 were twice as likely to volunteer for the study, it would mean 2x as many positive tests in the sample and thus the conclusion that 2x as many people in the county of Santa Clara have had the disease.

This is why it is so important in statistics to have what we call “unbiased samples.”

  1. What is the “false positive rate” and why does it matter? 

Whenever you do an antibody test to see if someone has had a disease, you need to consider two kinds of mistakes that could happen. The test could come back negative even if someone had the disease – this is called a false negative – and the test could come back positive even if someone didn’t have the disease – this is called a false positive.

If a disease is rare (such as COVID19 in Santa Clara county) we need to worry mostly about the false positives. Using test data from the manufacturer, the authors estimate the specificity to be between 98.1- 99.9. (When they include their own data, the range becomes 98.3 – 99.9). This means that the false positive rate is somewhere between 0.1 and 1.9%. In other words, even if you test only people who have never had the disease, between 0.1 and 1.9% of people would still test positive.

What does all of this mean? 

Imagine we are testing 1000 people in an imaginary Santa Clara county.

Now imagine that 1% of the population has had COVID19. That would be around 10 people out of 1000. But, because people who were recently sick are more likely to volunteer for the study, maybe instead of 10, 20 people out of 1000 are positive. That’s 2% of the sample.

The other 98% of the sample should have a negative test. But, we know that the false positive rate of this test is between 0.1 and 1.9%, which means you’ll get another 1-19 people who test positive even if they never had the disease! Let’s assume for now that we get 10 false positives. Now we have in total 30 positive tests out of 1000 people tested. That could lead you to think that 3% of the sample of 1000 people has had COVID19 and thus 3% of Santa Clara county has had COVID19. Even though the real rate in our imaginary Santa Clara example was only 1%!

In the real Santa Clara study, 50 out of 3300 tests were positive (1.5%). In principle, these could all be false positives!

A lot of experts (here, here and here) are worried that the Stanford researchers have underestimated the false positive rate and have not corrected for their biased sample. And because they didn’t deal well with these two issues, they overestimate the percentage of people in Santa Clara who have had the disease.

How could this be done better?

  1. Get a more random sample. Dr Natalie Dean from Univerisity of Florida explains why household testing is the gold standard.
  2. Get a better sense of the false positive rate. Between 0.1 and 1.9 % is too wide a range if the number you are trying to measure is likely in the same range.

Why are these numbers in Santa Clara important? 

Why does it matter so much whether 0.5, 1, or 3% of people in Santa Clara have had COVID19?

Well, as of today, 73 people have died of COVID19 in Santa Clara county. If that is 73 out of 40,000 – 80,000 infected – as the Stanford researchers suggest – then the chance of dying of COVID19 is relatively low (infection fatality rate 0.1-0.2%). But if that is 73 out of, say, 10,000 – 20,000 which is more realistic, the chance of dying from COVID19 is higher (infection fatality rate 0.3-0.7%).

Because the Stanford researchers suggest that there is a ton of people in Santa Clara county who have had the disease and only relatively few who died, they suggest that the disease is maybe not so lethal. Others are taking these results and saying: “Stanford says it’s just like the flu, we can stop the lock-downs and open up the economy!”

Many public health experts think it is way too early to open up the economy and a lot more people will die if we do so now.

In fact, the reason that Santa Clara county has a very low number of people who have had the disease (probably around 1% or lower), is probably that Santa Clara county was one of the first counties in the country to issue a Stay-At-Home order and Stanford University (which is in Santa Clara county) was one of the first universities to close its campus. In many ways, Santa Clara county and Stanford have been an example in how to deal with this epidemic effectively.

I hope that if you read about more studies that use antibody tests, you read critically to determine whether their sample was random and how high the false positive rate is compared to the real positive rate they are trying to estimate.

#StayHomeStaySafe

 

 

How does SARS-CoV2 translation work?

2 Apr

The last couple of days I worked to prepare a lecture on how the coronavirus SARS-CoV2 uses several nifty genetics tricks to translate the proteins it needs to make new viral particles. I first wrote about this on twitter (here, here and here) and then made a slide deck. The slides are here: https://figshare.com/articles/Coronavirus_covid19_translation_genetics_slides_for_genetics_class/12065649

Why talk about SARS-CoV2 translation?

Do students need to learn about how proteins and stop codons in SARS-CoV2? I don’t think they really need to, but combining genetics (the topic of my class) with SARS-CoV2 (the topic that everyone is thinking about anyways) is – in my eyes – a good strategy.

I strongly believe that we should always make our classes as relevant to our students as possible. For genetics, this is actually quite easy. Many students have a very strong interest in genetics. This doesn’t mean that they all really want to know everything about Mendel or fruitfly crosses. Many of them really want to know about cancer, GMOs and why some kids looks exactly like their parents (or not). Let’s start with what the students want to know!

When I first taught genetics, I used a lot of material from Dr Rosie Redfield’s Useful Genetics class, and a lot of my ideas about teaching genetics are based on her work.

Thread 1: the first polyprotein and how SARS-CoV2 uses our ribosome to make 11 proteins

Screenshot 2020-04-02 08.28.22

Thread on threadreader: https://threadreaderapp.com/thread/1244118877803433984.html

Thread 2 how to ignore a stop sign

Screenshot 2020-04-02 08.33.39

Thread on threadreader:

https://threadreaderapp.com/thread/1244471522477006848.html

Thread 3 jumping RNA dependent RNA polymerase

Screenshot 2020-04-02 08.34.03

Thread on threadreader: https://threadreaderapp.com/thread/1244849592132198400.html