Archive by Author

How we run an inclusive & online coding program for biology and chem undergrads in 2020 

7 May

By: Nicole Adelstein, Pleuni Pennings, Rori Rohlfs

Coding summer program (BDSP) in 2018, when students were in the same room for 8 hours a week.

In 2018 this team (led by Chinomnso Okorie) met in the “yellow room” for 8 hours a week to learn R.  

We have been running combined coding/research summer programs for several years, with a  focus on undergraduate students, women, and students from historically underrepresented racial and ethnic groups. This summer, we will run our 9-week program as an online program. We think that others may be interested in doing this too, so we’ll share here how we plan to  do it. 

Some of the information below will also be published as a “ten rules paper” in Plos Computational Biology*, but we wanted to share this sooner and focus on doing things online vs in person. 

TL; DR version

  1. Have students work in teams of 4 or 5, for 2 hours per day, 4 days a week. Learning to code should be done part-time, even if your program is full time. 
  2. Use near-peer mentors to facilitate the team meetings (not to teach, but to facilitate). 
  3. Use existing online courses – we’ll share a few that we like. Don’t try to make your own curriculum last minute. There are good online courses available. 
  4. Give the students a simple (repeat: simple!) research project to work on together. 

1. Have students work in teams for two hours a day – with pre-set times. 

Learning to code is stressful and tiring. Even though many students may not have jobs this summer – it doesn’t mean that they can code for 8 hours a day. First, because they have other stuff to do (like taking care of family members) and second because there’s a limit to how long you can be an effective learner. 

Our program is 10 hours per week (8 hours of coding, 2 hours of “all-hands” meeting). We make it clear that no work is expected outside of these hours. For example, a team may meet from 10am to 12pm four days a week for coding. 

Check-ins, quiet working, shared problem solving. 

During the coding hours, the near-peer mentor is always present (on Zoom, of course!) and facilitates the meeting. The very first day should be all about introductions and expectations. After that, we suggest that every day, there is time for check-ins (everybody shares how they are doing, what they’re excited about or struggling with, or what music they’re listening to), quiet working (mute all microphones, set a timer, everybody works on the online class by themselves) and shared problem solving (for example, let’s talk about the assignment X from the online class). One of the mentors last year was successful with starting every meeting with a guided meditation. 

Each team has a faculty mentor in our program (this could be a postdoc or faculty member). Once a week, the faculty mentor joins the meeting for about 1 hour. This hour could consist of introductions / check-ins, a short presentation or story by the faculty mentor, and the opportunity for the team to ask questions. It’s great if the near-peer mentor and the team prepare questions beforehand. 

1B. Add a non-coding meeting (if you can/want)

In addition to the 8 coding hours per week, our students also meet for 2 hours per week in an “all hands meeting”. Such an all-hands meeting is not absolutely necessary, but if you have the bandwidth, it may be nice to meet once a week to do something other than coding. Maybe to read a paper together or meet with someone online (an alum who is now somewhere else? A faculty member or grad student?). 

If your program is full time (like an REU program), we suggest to still only do about 8-15 hours of coding per week. Fill up the rest with more standard things such as lectures, reading etc (and don’t make anyone do Zoom 40 hours a week!). If students are enjoying themselves with coding and getting more confident, they may do more coding by themselves, but in our program it is not the expectation. 

2. Mentors and teams are key 

When working alone, we’ve often seen students get stuck on technical problems, leaving many feeling lost and inadequate and wanting to discontinue learning this new skill. Working in a mentored team, however, students have access to immediate support from their peers and mentor. This helps them learn technical skills more efficiently, develop relationships with each other, and cultivate a shared sense of belonging in computational research (Kephart et al. 2008). We recommend that each participant in a coding summer program be assigned to a team of 4 to 5 students with similar technical skill levels led by a near-peer mentor. 

Mentors in our program are typically a year or two ahead of participants but belong to similar demographic groups and come from similar academic backgrounds. The mentor facilitates the meetings and leads the team in learning skills and applying them to a research question (without doing the work themselves). 

Each team also has a faculty advisor, who comes up with a research project that is likely to be completed in the available time and that is of interest to the students (Harackiewicz et al. 2008). The faculty advisor meets with the whole team at least once per week to guide learning and research. Of note, acting as a mentor improves students’ retention and success in STEM (Trujillo et al. 2015) therefore, this setup benefits mentors as well as mentees. 

2B. Who can be mentors? 

Over the years, we have found that near-peer mentors are incredibly useful for a number of reasons including 1) student participants are more likely to ask for help from a near-peer mentor than from a faculty advisor, 2) near-peer mentors serve as role models, giving participants an idea of what they can aim for in the next year or two, and 3) the use of mentors allows the program to serve many more participants than it could if it relied on a few time-pressed faculty advisors. Our selection criteria for mentors include essential knowledge (for example, the mentor for a team doing an advanced chemistry research project should have taken physical chemistry), mentoring experience or potential, logistical availability, and having a similar demographic background as the participants. Mentors don’t need experience with the specific coding language or research topic they will work on with their team. Rather than being the expert in the room, they are expected to help team members work together to find solutions or formulate questions for the faculty advisor. 

Mentors are crucial for the success of the program and need to be paid well for their work. Each week of the program, we pay our mentors a competitive wage for 8 contact hours with their team, a 2-hour all hands lunch meeting, a 2-hour mentor meeting, and 3-4 additional hours to account for preparation. However, we realize that this summer, things may be different for many! You may find that PhD students or Master’s students who can not work in the lab (but are still paid / on a fellowship) could be excellent near-peer mentors. Just make sure that the mentors know that this is a real commitment that will eat up a significant chunk of time each week. 

3. Identify an appropriate online course for each team

We have found that when learning basic coding skills, interactive online classes to learn computer programming (for example, from Datacamp, Udacity or Coursera) motivate and engage students better than books or online texts. Yet, when working individually, most students – especially beginners and historically underrepresented students – don’t finish online classes (Ihsen et al. 2013; Jordan 2015). As a solution, we have found that in teams, where students can work together and support each other, they learn a great deal from an online class. 

Each team’s faculty advisor picks a free, clearly structured online class with videos and assignments to teach participants coding skills. We have had good experiences with Udacity’s Exploratory Data Analysis course because this class is suitable for beginners. It does a good job motivating students to think about data and learn R. In early team meetings, participants spend time quietly working on the online class with their headphones on, followed by a team discussion or collaborative problem-solving session. If students encounter difficulty with any of the material, mentors may develop mini-lectures or create their own exercises to facilitate learning. Note, the students’ goal is not necessarily to finish the online course, but to learn enough to perform their research project. 

3B. Suggested classes:

Udacity Exploratory Data Analysis with R https://www.udacity.com/course/data-analysis-with-r–ud651

CodeHS https://codehs.com/ (the faculty mentor or the near-peer-mentor needs to create a section on Code HS, we use the introduction to python (rainforest).  

Coursera https://www.coursera.org/learn/r-programming (this one is a tip from our UCSF colleague Dr Kala Mehta)

4. Assign each team a simple and engaging research project 

Learning to code without a specific application in mind can feel boring and irrelevant, sometimes leading students to abandon the effort. In our summer program, teams carry out a research project to motivate them to learn coding skills, improve their sense of belonging in science (Jones, Barlow, and Villarejo 2010) and cultivate their team work and time/project management skills. Faculty advisors assign each team a research project early in the program. These projects should answer real questions so that participants feel their work is valuable (Woodin, Carter, and Fletcher 2017). The projects should also be relatively simple. Small and self contained projects that can be completed within a three week time frame are ideal to ensure completion and make participants feel that their efforts have been successful. For example, past research projects in our program, which reflect the interests of faculty advisors and the students, include writing computer simulations to model the evolution of gene expression, analyzing bee observations from a large citizen science project, examining trends in google search term data with respect to teen birth outcomes, and building an app for finding parking spots on or near campus. 

For 2020, we’d like to encourage you to pick a project that appears extremely simple if you normally use R or Python to make your plots / do stats, but that would be quite challenging if you’re new to coding. We also suggest that – unless the students are already quite advanced – you don’t give them a project that you want to publish on quickly. Nobody needs more pressure this summer.  

Here are some suggestions for simple research projects

  1. Let students plot the number of COVID19 cases in their county over time using R. Let them plot the number of cases in 5 different counties on the same figure. Add an arrow for when a stay-at-home order was implemented or terminated. Easy to download data are here: https://github.com/nytimes/covid-19-data 
  2. Let students keep track of how many steps they take each day for 10 days using their phone or watch. Let them plot the number of steps per day using R. Let them add a line for the mean. Collect data from 6 people and create a pdf with 6 plots in different colors. 
  3. If you have any data from your lab, let the students plot those data. Try making 4 different plots with the same data (scatter, box, histogram, etc). 
  4. Let students recreate an existing plot from a publication when the data are available. 
  5. Let students analyze (anonymized) data from your class. How strong is the correlation between midterm grades and final exam grades? Do students who hand in homework regularly do better on the test? 

* reference: Pleuni Pennings, Mayra M. Banuelos, Francisca L. Catalan, Victoria R. Caudill, Bozhidar Chakalov, Selena Hernandez, Jeanice Jones, Chinomnso Okorie, Sepideh Modrek, Rori Rohlfs, Nicole Adelstein Ten simple rules for an inclusive summer coding program for non-CS undergraduates, accepted for publication in Plos Computational Biology.

Why I don’t believe that 2.5-4% of people in Santa Clara county have had COVID19

19 Apr

Originally posted on April 19th. Small edits on April 21st. Thanks to Scott Roy and Dmitri Petrov for comments. 

This week a study was published by researchers from Stanford about how many people in Santa Clara county have been infected with the new coronavirus SARSCoV2.

You may not have heard of Santa Clara county, but it’s the heart of Silicon Valley and its most famous residents are Stanford, Google, Apple and IBM. I lived there too for a few years when I was a postdoc at Stanford.

The researchers used a new test to detect antibodies against the virus. Antibody testing is going to be super important in the near future, but I have serious concerns about this study and its conclusions.

The main result from the paper is that they estimate that between 2.5 – 4% of people in Santa Clara have had COVID19. That would mean between 48,000 and 81,000 people. If this is correct, it would mean that the virus has infected many more people in Santa Clara than the official numbers suggest. (50-85 fold more).

If 50-85 times sounds hard to believe, that’s because it is. Even though most experts agree that the real number of infected people is higher than the reported numbers, 50-85 fold higher than reported would be quite crazy. In research, we like to say that “extraordinary claims require extraordinary evidence” (https://en.wikipedia.org/wiki/Sagan_standard). Here the claim is extraordinary but the evidence isn’t. Also, we learn that even if a study comes from a great university – this is no guarantee that the study is good.

2.5-4% seroprevalence is unlikely in Santa Clara county

Why is 2.5-4% positive in Santa Clara county an extraordinary claim? This is because in the European countries where seroprevalence is around 3%, many more people have died (relative to the size of the population) than in Santa Clara county. It would be very unlikely that the infection fatality rate (how likely you die when you catch the virus) is significantly lower in Santa Clara then in other parts of the world. For example, The Netherlands also reports a 3% seroprevalence, but has 5.5 times as many deaths per 100,000 people compared to Santa Clara county.

Two issues: biased sample and false positives

In my opinion, there are two main issues with this study. Both make that the Stanford researchers overestimate the number of people who were infected with SARS-CoV2.

One is that this was probably NOT a random sample. And two is that the false positive rate for this kind of test is high. This means that we don’t know if the people who had a positive test result have really been infected with the virus.

  1. Why is this not a random sample? 

They asked people to volunteer for this study using Facebook ads. Now, I think there is nothing wrong -in principle- with using Facebook ads to recruit people. But I do think that people who have been sick with a fever and cough recently are more likely to volunteer for this study to test whether they’ve had COVID19!

If people who actually had COVID19 were twice as likely to volunteer for the study, it would mean 2x as many positive tests in the sample and thus the conclusion that 2x as many people in the county of Santa Clara have had the disease.

This is why it is so important in statistics to have what we call “unbiased samples.”

  1. What is the “false positive rate” and why does it matter? 

Whenever you do an antibody test to see if someone has had a disease, you need to consider two kinds of mistakes that could happen. The test could come back negative even if someone had the disease – this is called a false negative – and the test could come back positive even if someone didn’t have the disease – this is called a false positive.

If a disease is rare (such as COVID19 in Santa Clara county) we need to worry mostly about the false positives. Using test data from the manufacturer, the authors estimate the specificity to be between 98.1- 99.9. (When they include their own data, the range becomes 98.3 – 99.9). This means that the false positive rate is somewhere between 0.1 and 1.9%. In other words, even if you test only people who have never had the disease, between 0.1 and 1.9% of people would still test positive.

What does all of this mean? 

Imagine we are testing 1000 people in an imaginary Santa Clara county.

Now imagine that 1% of the population has had COVID19. That would be around 10 people out of 1000. But, because people who were recently sick are more likely to volunteer for the study, maybe instead of 10, 20 people out of 1000 are positive. That’s 2% of the sample.

The other 98% of the sample should have a negative test. But, we know that the false positive rate of this test is between 0.1 and 1.9%, which means you’ll get another 1-19 people who test positive even if they never had the disease! Let’s assume for now that we get 10 false positives. Now we have in total 30 positive tests out of 1000 people tested. That could lead you to think that 3% of the sample of 1000 people has had COVID19 and thus 3% of Santa Clara county has had COVID19. Even though the real rate in our imaginary Santa Clara example was only 1%!

In the real Santa Clara study, 50 out of 3300 tests were positive (1.5%). In principle, these could all be false positives!

A lot of experts (here, here and here) are worried that the Stanford researchers have underestimated the false positive rate and have not corrected for their biased sample. And because they didn’t deal well with these two issues, they overestimate the percentage of people in Santa Clara who have had the disease.

How could this be done better?

  1. Get a more random sample. Dr Natalie Dean from Univerisity of Florida explains why household testing is the gold standard.
  2. Get a better sense of the false positive rate. Between 0.1 and 1.9 % is too wide a range if the number you are trying to measure is likely in the same range.

Why are these numbers in Santa Clara important? 

Why does it matter so much whether 0.5, 1, or 3% of people in Santa Clara have had COVID19?

Well, as of today, 73 people have died of COVID19 in Santa Clara county. If that is 73 out of 40,000 – 80,000 infected – as the Stanford researchers suggest – then the chance of dying of COVID19 is relatively low (infection fatality rate 0.1-0.2%). But if that is 73 out of, say, 10,000 – 20,000 which is more realistic, the chance of dying from COVID19 is higher (infection fatality rate 0.3-0.7%).

Because the Stanford researchers suggest that there is a ton of people in Santa Clara county who have had the disease and only relatively few who died, they suggest that the disease is maybe not so lethal. Others are taking these results and saying: “Stanford says it’s just like the flu, we can stop the lock-downs and open up the economy!”

Many public health experts think it is way too early to open up the economy and a lot more people will die if we do so now.

In fact, the reason that Santa Clara county has a very low number of people who have had the disease (probably around 1% or lower), is probably that Santa Clara county was one of the first counties in the country to issue a Stay-At-Home order and Stanford University (which is in Santa Clara county) was one of the first universities to close its campus. In many ways, Santa Clara county and Stanford have been an example in how to deal with this epidemic effectively.

I hope that if you read about more studies that use antibody tests, you read critically to determine whether their sample was random and how high the false positive rate is compared to the real positive rate they are trying to estimate.

#StayHomeStaySafe

 

 

How does SARS-CoV2 translation work?

2 Apr

The last couple of days I worked to prepare a lecture on how the coronavirus SARS-CoV2 uses several nifty genetics tricks to translate the proteins it needs to make new viral particles. I first wrote about this on twitter (here, here and here) and then made a slide deck. The slides are here: https://figshare.com/articles/Coronavirus_covid19_translation_genetics_slides_for_genetics_class/12065649

Why talk about SARS-CoV2 translation?

Do students need to learn about how proteins and stop codons in SARS-CoV2? I don’t think they really need to, but combining genetics (the topic of my class) with SARS-CoV2 (the topic that everyone is thinking about anyways) is – in my eyes – a good strategy.

I strongly believe that we should always make our classes as relevant to our students as possible. For genetics, this is actually quite easy. Many students have a very strong interest in genetics. This doesn’t mean that they all really want to know everything about Mendel or fruitfly crosses. Many of them really want to know about cancer, GMOs and why some kids looks exactly like their parents (or not). Let’s start with what the students want to know!

When I first taught genetics, I used a lot of material from Dr Rosie Redfield’s Useful Genetics class, and a lot of my ideas about teaching genetics are based on her work.

Thread 1: the first polyprotein and how SARS-CoV2 uses our ribosome to make 11 proteins

Screenshot 2020-04-02 08.28.22

Thread on threadreader: https://threadreaderapp.com/thread/1244118877803433984.html

Thread 2 how to ignore a stop sign

Screenshot 2020-04-02 08.33.39

Thread on threadreader:

https://threadreaderapp.com/thread/1244471522477006848.html

Thread 3 jumping RNA dependent RNA polymerase

Screenshot 2020-04-02 08.34.03

Thread on threadreader: https://threadreaderapp.com/thread/1244849592132198400.html

New video about how SARS-CoV2 spreads

28 Mar

 

I worked with Brandon Ugbunu, Senay Yitbarek and Olivia Pham to make this video about how the SARS-CoV2 virus, which causes COVID19 spreads.

Hope it’s useful!

 

New video: COVID19 in numbers: R0, the case fatality rate and why we need to flatten the curve

11 Mar
ReduceR0

Pleuni Pennings, Senay Yitbarek and Brandon Ogbunu are asking all mayor and presidents to help reduce R0 for the SARS-CoV2 / COVID19 outbreak by canceling events and washing your hands.

Brandon Ogbunu (Brown University),  Senay Yitbarek (UC Berkeley) and I (Pleuni Pennings, SFSU) made a video about the two numbers most often used to describe the new coronavirus outbreak: R0 and the case fatality rate. We also talk about why we should and how we can “flatten the curve.”

Feel free to share, use as homework assignment, show in the classroom! Ideal for college level biology and calculus classes.

 

Translations kindly contributed by the following people:

Dutch translation by Alex Verkade.
Spanish translation by Berenice Chavez and Cecilia Hernandez.
Portuguese translation by Murillo Rodrigues and Luiza Ostrowski.

The video is also on YouTube: https://youtu.be/-3xZVhFhP8w

Download the slides here: COVID19_FlatteningTheCurveSlidesMarch172020

Using phylogenies to understand the novel coronavirus outbreak

3 Mar

I made a video about the use of phylogenies to understand the current coronavirus outbreak. I hope it is useful for your class if you teach genetics, evolution or virology – feel free to show in your lecture or assign as homework.  On the Vimeo website, you can download the video.

 

Link to Nextstrain.org

Slides: Phylogenies and the Corona Virus Outbreak

Link to original video on Vimeo (which you can download): https://vimeo.com/395051566

Link to Trevor Bedford’s website: https://bedford.io/team/trevor-bedford/

Link to the original tweet I refer to in the video: https://twitter.com/trvrb/status/1233970271318503426?s=20

Link to blog post by Trevor Bedford on same topic: https://bedford.io/blog/ncov-cryptic-transmission/

Meet Francisca Catalan, SFSU PINC alum and research associate at UCSF

9 Jan
FranciscaCatalan

Francisca Catalan, SFSU PINC alum and research associate at UCSF

  1. How did you get into coding? 

I took a regular CS class my second year at SF state. I thought it would be a good skill to have as an aspiring researcher and saw that it fulfilled one of my major requirements. It was a PowerPoint-heavy 8 am class three times a week. I didn’t talk to anyone else in the class and by the end of the semester I found it very difficult to show up. I passed the class but was really devastated about my experience. I thought I could never learn to program, though I never gave up completely. A couple semesters went by and I saw a friendly flier announcing PINC, SFSU’s program that promotes inclusivity in computing for biologist and other non-computer science majors. I eagerly signed up and started the “Intro to Python” class soon after. Then, with some more programming under my belt, I joined Dr. Rohlfs’ lab and began doing research in the dry lab for the remainder of my undergraduate career.

  1. What kind of work do you do now? 

I currently work at UCSF as a dry lab research associate. Our lab focuses on an aggressive form of brain cancer, glioblastoma. We try to find gene targets for new drug treatments and research the cell type of these cancerous cells in order to fight drug resistance. My main duties now include creating pipelines for our single cell, RNA-Seq, and Whole Genome Sequencing data. You can read about our lab’s latest study in our new publication on cancer discovery! DOI: 10.1158/2159-8290.

https://cancerdiscovery.aacrjournals.org/content/candisc/early/2019/09/25/2159-8290.CD-19-0329.full.pdf

  1. How did learning coding skills impact your career?

Coding has opened so many pathways for me. I was able to find a great job at UCSF soon after graduating with my Bachelor’s of Science in cell and molecular biology and minor in Computing Applications. It has also given be a giant boost of confidence! As a woman of color in STEM, I often felt underrepresented and out of place, but those feelings now quickly subside when I can help my colleagues answer coding questions! It’s motivating to feel like a necessary component of your community when often time you feel pushed out. It’s also impacted my career choices! I know now I want to be a professor in the future, I want to provide access to programming to others in hopes it will open pathways like it did for me!

  1. Do you have any advice for students who are just starting? 

Yes! Don’t give up! It can be really difficult to learn coding, but know that it’s not you, talking to a computer can just be hard sometimes! Continue practicing and ask questions, google your heart out. Take breaks when necessary, remember to breathe, and keep in mind all the amazing science you will be able to do once you have these skills under your belt!