Tag Archives: covid19

Why I don’t believe that 2.5-4% of people in Santa Clara county have had COVID19

19 Apr

Originally posted on April 19th. Small edits on April 21st. Thanks to Scott Roy and Dmitri Petrov for comments. 

This week a study was published by researchers from Stanford about how many people in Santa Clara county have been infected with the new coronavirus SARSCoV2.

You may not have heard of Santa Clara county, but it’s the heart of Silicon Valley and its most famous residents are Stanford, Google, Apple and IBM. I lived there too for a few years when I was a postdoc at Stanford.

The researchers used a new test to detect antibodies against the virus. Antibody testing is going to be super important in the near future, but I have serious concerns about this study and its conclusions.

The main result from the paper is that they estimate that between 2.5 – 4% of people in Santa Clara have had COVID19. That would mean between 48,000 and 81,000 people. If this is correct, it would mean that the virus has infected many more people in Santa Clara than the official numbers suggest. (50-85 fold more).

If 50-85 times sounds hard to believe, that’s because it is. Even though most experts agree that the real number of infected people is higher than the reported numbers, 50-85 fold higher than reported would be quite crazy. In research, we like to say that “extraordinary claims require extraordinary evidence” (https://en.wikipedia.org/wiki/Sagan_standard). Here the claim is extraordinary but the evidence isn’t. Also, we learn that even if a study comes from a great university – this is no guarantee that the study is good.

2.5-4% seroprevalence is unlikely in Santa Clara county

Why is 2.5-4% positive in Santa Clara county an extraordinary claim? This is because in the European countries where seroprevalence is around 3%, many more people have died (relative to the size of the population) than in Santa Clara county. It would be very unlikely that the infection fatality rate (how likely you die when you catch the virus) is significantly lower in Santa Clara then in other parts of the world. For example, The Netherlands also reports a 3% seroprevalence, but has 5.5 times as many deaths per 100,000 people compared to Santa Clara county.

Two issues: biased sample and false positives

In my opinion, there are two main issues with this study. Both make that the Stanford researchers overestimate the number of people who were infected with SARS-CoV2.

One is that this was probably NOT a random sample. And two is that the false positive rate for this kind of test is high. This means that we don’t know if the people who had a positive test result have really been infected with the virus.

  1. Why is this not a random sample? 

They asked people to volunteer for this study using Facebook ads. Now, I think there is nothing wrong -in principle- with using Facebook ads to recruit people. But I do think that people who have been sick with a fever and cough recently are more likely to volunteer for this study to test whether they’ve had COVID19!

If people who actually had COVID19 were twice as likely to volunteer for the study, it would mean 2x as many positive tests in the sample and thus the conclusion that 2x as many people in the county of Santa Clara have had the disease.

This is why it is so important in statistics to have what we call “unbiased samples.”

  1. What is the “false positive rate” and why does it matter? 

Whenever you do an antibody test to see if someone has had a disease, you need to consider two kinds of mistakes that could happen. The test could come back negative even if someone had the disease – this is called a false negative – and the test could come back positive even if someone didn’t have the disease – this is called a false positive.

If a disease is rare (such as COVID19 in Santa Clara county) we need to worry mostly about the false positives. Using test data from the manufacturer, the authors estimate the specificity to be between 98.1- 99.9. (When they include their own data, the range becomes 98.3 – 99.9). This means that the false positive rate is somewhere between 0.1 and 1.9%. In other words, even if you test only people who have never had the disease, between 0.1 and 1.9% of people would still test positive.

What does all of this mean? 

Imagine we are testing 1000 people in an imaginary Santa Clara county.

Now imagine that 1% of the population has had COVID19. That would be around 10 people out of 1000. But, because people who were recently sick are more likely to volunteer for the study, maybe instead of 10, 20 people out of 1000 are positive. That’s 2% of the sample.

The other 98% of the sample should have a negative test. But, we know that the false positive rate of this test is between 0.1 and 1.9%, which means you’ll get another 1-19 people who test positive even if they never had the disease! Let’s assume for now that we get 10 false positives. Now we have in total 30 positive tests out of 1000 people tested. That could lead you to think that 3% of the sample of 1000 people has had COVID19 and thus 3% of Santa Clara county has had COVID19. Even though the real rate in our imaginary Santa Clara example was only 1%!

In the real Santa Clara study, 50 out of 3300 tests were positive (1.5%). In principle, these could all be false positives!

A lot of experts (here, here and here) are worried that the Stanford researchers have underestimated the false positive rate and have not corrected for their biased sample. And because they didn’t deal well with these two issues, they overestimate the percentage of people in Santa Clara who have had the disease.

How could this be done better?

  1. Get a more random sample. Dr Natalie Dean from Univerisity of Florida explains why household testing is the gold standard.
  2. Get a better sense of the false positive rate. Between 0.1 and 1.9 % is too wide a range if the number you are trying to measure is likely in the same range.

Why are these numbers in Santa Clara important? 

Why does it matter so much whether 0.5, 1, or 3% of people in Santa Clara have had COVID19?

Well, as of today, 73 people have died of COVID19 in Santa Clara county. If that is 73 out of 40,000 – 80,000 infected – as the Stanford researchers suggest – then the chance of dying of COVID19 is relatively low (infection fatality rate 0.1-0.2%). But if that is 73 out of, say, 10,000 – 20,000 which is more realistic, the chance of dying from COVID19 is higher (infection fatality rate 0.3-0.7%).

Because the Stanford researchers suggest that there is a ton of people in Santa Clara county who have had the disease and only relatively few who died, they suggest that the disease is maybe not so lethal. Others are taking these results and saying: “Stanford says it’s just like the flu, we can stop the lock-downs and open up the economy!”

Many public health experts think it is way too early to open up the economy and a lot more people will die if we do so now.

In fact, the reason that Santa Clara county has a very low number of people who have had the disease (probably around 1% or lower), is probably that Santa Clara county was one of the first counties in the country to issue a Stay-At-Home order and Stanford University (which is in Santa Clara county) was one of the first universities to close its campus. In many ways, Santa Clara county and Stanford have been an example in how to deal with this epidemic effectively.

I hope that if you read about more studies that use antibody tests, you read critically to determine whether their sample was random and how high the false positive rate is compared to the real positive rate they are trying to estimate.

#StayHomeStaySafe

 

 

New video about how SARS-CoV2 spreads

28 Mar

 

I worked with Brandon Ugbunu, Senay Yitbarek and Olivia Pham to make this video about how the SARS-CoV2 virus, which causes COVID19 spreads.

Hope it’s useful!

 

New video: COVID19 in numbers: R0, the case fatality rate and why we need to flatten the curve

11 Mar
ReduceR0

Pleuni Pennings, Senay Yitbarek and Brandon Ogbunu are asking all mayor and presidents to help reduce R0 for the SARS-CoV2 / COVID19 outbreak by canceling events and washing your hands.

Brandon Ogbunu (Brown University),  Senay Yitbarek (UC Berkeley) and I (Pleuni Pennings, SFSU) made a video about the two numbers most often used to describe the new coronavirus outbreak: R0 and the case fatality rate. We also talk about why we should and how we can “flatten the curve.”

Feel free to share, use as homework assignment, show in the classroom! Ideal for college level biology and calculus classes.

 

Translations kindly contributed by the following people:

Dutch translation by Alex Verkade.
Spanish translation by Berenice Chavez and Cecilia Hernandez.
Portuguese translation by Murillo Rodrigues and Luiza Ostrowski.

The video is also on YouTube: https://youtu.be/-3xZVhFhP8w

Download the slides here: COVID19_FlatteningTheCurveSlidesMarch172020

Using phylogenies to understand the novel coronavirus outbreak

3 Mar

I made a video about the use of phylogenies to understand the current coronavirus outbreak. I hope it is useful for your class if you teach genetics, evolution or virology – feel free to show in your lecture or assign as homework.  On the Vimeo website, you can download the video.

 

Link to Nextstrain.org

Slides: Phylogenies and the Corona Virus Outbreak

Link to original video on Vimeo (which you can download): https://vimeo.com/395051566

Link to Trevor Bedford’s website: https://bedford.io/team/trevor-bedford/

Link to the original tweet I refer to in the video: https://twitter.com/trvrb/status/1233970271318503426?s=20

Link to blog post by Trevor Bedford on same topic: https://bedford.io/blog/ncov-cryptic-transmission/