Tag Archives: teaching

SFSU bio and chem Master’s students do machine learning and scicomm

20 May

This semester (spring 2021) I taught a new class together with my colleagues Dax Ovid and Rori Rohlfs: Exploratory Data Science for Scientists. This class is part of our new GOLD program through which Master’s students can earn a certificate in Data Science for Biology and Chemistry (link). We were happily surprised when 38 students signed up for the class! 

In the last few weeks of the class I taught some machine learning and as their final project, students had to find their own images to do image classification with a convolutional neural network. Then they had to communicate their science to a wide audience through blog, video or twitter. Here are the results! I am very proud 🙂

If you are interested in the materials we used, let me know.


Two teams made videos about their final project: 

Anjum Gujral, Jan Mikhale Cajulao, Carlos Guzman and Cillian Variot classified flowers and trees. 

Ryan Acbay, Xavier Plasencia, Ramon Rodriguez and Amanda Verzosa looked at Asian and African elephants. 


Three teams decided to use Twitter to share their results. 

Jacob Gorneau, Pooneh Kalhori, Ariana Nagainis, Natassja Punak and Rachel Quock looked at male and female moths. 

Joshua Vargas Luna, Tatiana Marrone, Roberto (Jose) Rodrigues and Ale (Patricia) Castruita and Dacia Flores classified sand dollars. 

Jessica Magana, Casey Mitchell and Zachary Pope found cats and dogs. 


Finally, four teams wrote blogs about their projects

Adrian Barrera-Velasquez, Rudolph Cheong, Huy Do and Joel Martinez studied bagels and donuts. 

Jeremiah Ets-Hokin, Carmen Le, Saul Gamboa Peinada and Rebecca Salcedo were excited about dogs! 

Teagan Bullock, Joaquin Magana, Austin Sanchez and Michael Ward worked with memes. 

Musette Caldera, Lorenzo Mena and Ana Rodriguez Vega classified trees and flowers. 


Using phylogenies to understand the novel coronavirus outbreak

3 Mar

I made a video about the use of phylogenies to understand the current coronavirus outbreak. I hope it is useful for your class if you teach genetics, evolution or virology – feel free to show in your lecture or assign as homework.  On the Vimeo website, you can download the video.


Link to Nextstrain.org

Slides: Phylogenies and the Corona Virus Outbreak

Link to original video on Vimeo (which you can download): https://vimeo.com/395051566

Link to Trevor Bedford’s website: https://bedford.io/team/trevor-bedford/

Link to the original tweet I refer to in the video: https://twitter.com/trvrb/status/1233970271318503426?s=20

Link to blog post by Trevor Bedford on same topic: https://bedford.io/blog/ncov-cryptic-transmission/

Recursion in real life

7 Oct

This semester, I am teaching a new class: Intro To Programming. I try to find ways to explain stuff so that all of my students (and me) understand it. Recursion is complex, but it reminds me of trying to make lunch plans when several people are involved. And when a function calls itself, it’s like putting a conversation on hold. Recursion

Recursion is when three calls are put on hold before a lunch decision is made.

Asha would like to have lunch with Blake.

They call Blake.

They say: Hi Blake, would you like to go for lunch in the student center?

Blake says to Asha: That’s a cool idea, but I was hoping to meet with Cynthia today.

Let me put you on hold and call them.

They call Cynthia

They say: Hi Cynthia, would you like to go for lunch in the student center?

Cynthia says to Blake: That’s a cool idea, but I was hoping to meet with Danny today.

Let me put you on hold and call them.

They call Danny

They say: Hi Danny, would you like to go for lunch in the student center?

Danny says to Cynthia: That’s a cool idea, but I was hoping to meet with Emilia today.

Let me put you on hold and call them.

They call Emilia

They say: Hi Emilia, would you like to go for lunch in the student center?

Emilia says to Danny: Yes! I’ll meet you at the student center.

Emilia hangs up

Danny says to Cynthia: Yes! I’ll meet you at the student center.

Danny hangs up

Cynthia says to Blake: Yes! I’ll meet you at the student center.

Cynthia hangs up

Blake says to Asha: Yes! I’ll meet you at the student center.

Blake hangs up

Asha is happy. All phone calls are ended and our friends can go to lunch!


Reading in the lab

11 Jan

The winter break is a great opportunity to spend time in the lab with my students. One of the things we do, is read papers. Last week, we spent a morning reading the following paper:

Triple-Antiretroviral Prophylaxis to Prevent Mother-To-Child HIV Transmission through Breastfeeding—The Kisumu Breastfeeding Study, Kenya: A Clinical Trial. PLoS Medicine, 2011. Thomas , Masaba, Borkowf, et al. 

The paper shows that antiretroviral drugs taken by an HIV-infected mother help prevent transmission to the baby through breastfeeding. The reported rates of HIV infection of the infants during breastfeeding were less than half the previously reported rates from untreated women.

After everyone read the paper, and we all discussed it together, two students worked together to write an abstract and three students worked together to draw an abstract. Here are the results:

Abstract (by Kadie and Melissa)

The Kisumu Breastfeeding Study was a single-arm trial conducted with 522 HIV–infected pregnant women who took a triple antiretroviral regimen from 34 weeks of pregnancy to 6 months after delivery. The triple-ARV regimen consisted of zidovudine and lamivudine and either nevirapine or the protease inhibitor nelfinavir. The purpose of the study was to investigate how various ARV regimens given to mother and/or their infants affect mother to child transmission of HIV.

Data collected showed that between 0 and 24 months, the cumulative HIV transmission rate rose from 2.5% to 7.0%. The cumulative HIV transmission or death rate was 15.7%. Three percent of babies born to mothers with a low viral load were HIV-positive compared to 8.7% of babies born to mothers with a high viral load. Similarly, 8.4% of babies born to mothers with low baseline CD4 cell counts were HIV positive compared to 4.1% of babies born to mothers with high baseline CD4 cell counts. Although these findings are limited by the single-arm design, this study supports the idea that a simple triple-ARV regimen given to HIV-positive pregnant women regardless of their baseline CD4 cell count can reduce MTCT during pregnancy and breastfeeding in a resource-limited setting.

Graphical abstract (by Olivia, Patricia and Dasha)

2016-01-07 12.40.27


Jobs in physiology and CS at SFSU

16 Nov

There are two job searches that interest me this year on our campus. One in our department (Biology) for an animal physiologist (the committee already started looking at applications, so if you are interested, you need to be fast!). The link to the ad is here .

The second search is in the Computer Science Department, and the ad is here. They are looking for someone with a “background in the database area, but also in areas related to social networking and collaboration, mobile computing, cloud computing and/or human/computer interaction.”

Both jobs are open to candidates at the assistant or associate professor level.

SFSU is a great place to work. Here are all the reasons why I am happy to be at SFSU.

If you are interested in doing research, training an extremely diverse student body and living in San Francisco, you should apply! Shoot me an email if you have any questions (pennings at sfsu dot edu).




No programming background? No problem! Learn R

14 Jun

Guest post by Rosana Callejas

Rosana Callejas

Rosana Callejas

Can someone with no programming knowledge learn “R”? The answer is yes! My name is Rosana Callejas. I am a Physiology major, and recent graduate from San Francisco State University. I began to learn the programming language “R” at the beginning of February of this year. Despite not having any previous programming experience , I analyzed my first data set of more than 20,000 data points in only a couple of months. Would you like to learn how I did it? Stay tuned.

The power of “R”

So what exactly is “R”? It is a programming language used by many data analysts, scientists, and statisticians, to analyze data, and perform statistical analysis with graphs and figures. “R” is a great tool when analyzing large data sets. It has many additional packages that can be downloaded, which allow the user to expand or simplify commands when analyzing data.

How R coded its way into my heart

Dr. Pleuni Pennings, an evolutionary biologist, and Professor at SFSU, introduced me to this wonderful tool. “I do all my research on my computer,” Dr. Pennings said, as she showed me the open program. At first, the idea puzzled me. In all my years as a biology student, I had never met a biologist like Dr. Pennings, who has made many discoveries from analyzing HIV DNA sequences using R. She explained to me that there is an accumulation of data collected by scientists everyday waiting to be analyzed. Therefore, there is a need for scientists with the skills to interpret, and draw conclusions from such large data sets. This interested me as biologist. I imagined all the new findings that could be made if all the data collected was analyzed. It would definitely contribute to the advancement of science. With this in mind, I embarked myself in the adventure of learning R.

One command at a time

I began by taking the online course “Exploratory Data Analysis with R” on Udacity.com. The course is composed of 6 lessons, in which I first learned the basics of R, a few basic commands, followed by the analysis of one variable, and how to make simple plots. In my learning, I used R, and R studio, which can be downloaded free online. I also used data sets provided by Udacity to analyze. In addition, R comes with other data sets I practiced with. My first graphing assignment was a simple bar plot (Figure 1), that represented friend count for Facebook users of different ages. This task required the package “ggplot2”, which allows graphing.


Figure 1. Friend count as function of age.

As I learned more, I began to work with different packages, new commands, and to make better graphs. I discovered how to add color to the graphs. I learned how to order variables, make subsets, group variables, add a new columns to my data sets, work with multiple variables, run correlation tests, and much more. The following are some figures that followed that first one, and show the progress of my learning as I added more detail to that first plot throughout the course.


Figure 2. Median friend count as function of age by gender.


Figure 3. Friend count as function of age.  In the green graph each point represents 20 data points in the data set. The black line represents the mean friend count. The blue line represents with the 50th quantile. The dotted lines represent the 90th and 10th quantiles.


Figure 4. The top graph represents friend count as function of age in months, with the blue line representing the mean. The middle graph represents friend count as a function of age with blue line represents the mean. The bottom graph represents friend count vs. age in moths rounded, multiplied, and divided by 5.

Figure 4. The top graph represents friend count as function of age in months, with the blue line representing the mean. The middle graph represents friend count as a function of age with blue line represents the mean. The bottom graph represents friend count vs. age in moths rounded, multiplied, and divided by 5.

Patience is the mother of all virtues

Learning R was definitely a challenge. Commands that in theory should work, sometimes did not work. As a new user, it was difficult to know exactly what had gone wrong. Fortunately, I had the guidance of Dr. Pennings who helped me through the process. I also looked for resources outside of Udacity. One great package to use along with R is “swirl,” which is a teaching package. With swirl, I learned commands not taught in the Udacity course. It has multiple lessons that give the user immediate feedback. Patience and persistence are key to learning R. Now I have seen what R can do, I know it was worth learning.

The possibilities are endless

My favorite feature of R is that the code used in a previous analysis can be saved, and reused. R users can also share pieces of code with one another, which helps expand the knowledge among users. If changes need to be made in the middle of analysis, this is rather simple, and there is no need to reanalyze the data. R can be used to study many different types of data of any size or background. Scientists such a Dr. Pennings make major findings in Biology using R.

Although new to R, I was able to begin the analysis of my own data set [1] within only a few months of learning about it. Below is a figure which resulted from the question: Which HIV regimens are most common and in what years? In order to answer this question, many hours of work were invested in preparing the data set, excluding undesired data points, sub setting, color coding, etc., ending up with 6255 HIV data points, which included only the 26 most common unique regimens as a function of time. The graph represents the most common regimens of HIV treatments taken by patients in different years. It is also organized in order of increasing number of drugs per regimen. Each regimen was color coded to include a NNRTI drug, a PI drug, or consist of nRTIs.

Figure 5. The graph represents the most common regimens of HIV treatments taken by patients in different years belonging either to NNRTI, nRTI, or PI.

Figure 5. The graph represents the most common regimens of HIV treatments taken by patients in different years belonging either to NNRTI, nRTI, or PI.

As the graph shows in 1989, and early 1990s, the HIV treatment consisted of the single drug AZT, and later in 1997, NVP. As the years progressed, regimens composed of two drugs became more common. It isn’t until 1996 that we begin to see regimens composed of three drugs. Regimens composed of three drugs are the most abundant and continue to be taken by patients up to 2013, while the single drug treatments seemed to have ceased in 2008. In 2002, we first observe regimens composed of four drugs (although RTV is often not counted as a drug, so these regimens may be considered 3-drug regimens as well), which also continue to be used along with the three drugs regimens.

R is a great program for data analysis. I believe that anyone who would like to learn it, with persistence can definitely do it. I will continue learning R, and analyzing my data set. I hope to use it as a useful tool for future investigations in my career.

[1] Thanks to Dr Robert Shafer from Stanford University for sharing the data with us!

15 papers on contemporary evolution in human viruses

29 May

In the fall semester of 2014 I taught a reading seminar for master students at SF State on contemporary evolution in human viruses. This blog post contains a list of the papers we read in the seminar.

I posted about this seminar previously here (about the seminar format) and here (no powerpoint allowed), and here (about being nervous for a talk).

The students’ work can be read and seen here (about H1N5), here (polio outbreak), here (Dengue), here (Ebola), here (HIV in court), here (doing my own homework), here (the origin of HIV), here (on bad small things) and here (Hep B).

These are the papers we read:

1. Fast evolution of drug resistance in HIV patient the 1980s


Resumption of HIV antigen production during continuous zidovudine treatment. Lancet. 1988 Feb 20;1(8582):421.
Reiss P, Lange JM, Boucher CA, Danner SA, Goudsmit J.

2. HIV: Doctor infects his ex-girlfriend, phylogenetic evidence in court


Metzker, Michael L., et al. “Molecular evidence of HIV-1 transmission in a criminal case.” Proceedings of the National Academy of Sciences 99.22 (2002): 14292-14297.

3. Very contemporary: the genomics of the West-African Ebola epidemic


Gire, Stephen K., et al. “Genomic surveillance elucidates Ebola virus origin and transmission during the 2014 outbreak.” Science 345.6202 (2014): 1369-1372.

4. Using phylogenetics to determine origin of Dengue-3 outbreak in Australia

An explosive epidemic of DENV-3 in Cairns, Australia. PLoS One. 2013 Jul 16;8(7):e68137. doi: 10.1371/journal.pone.0068137. Print 2013. Ritchie SA1, Pyke AT, Hall-Mendelin S, Day A, Mores CN, Christofferson RC, Gubler DJ, Bennett SN, van den Hurk AF.

5. Classic paper from Beatrice Hahn’s lab on origin of HIV-1


Gao, Feng, et al. “Origin of HIV-1 in the chimpanzee Pan troglodytes troglodytes.” Nature 397.6718 (1999): 436-441.

6. Timing the start of the HIV-1 pandemic


Korber, Bette, et al. “Timing the ancestor of the HIV-1 pandemic strains.”Science 288.5472 (2000): 1789-1796.

7. Where did the polio outbreak in Dominican Republic and Haiti come from?


Kew, Olen, et al. “Outbreak of poliomyelitis in Hispaniola associated with circulating type 1 vaccine-derived poliovirus.” Science 296.5566 (2002): 356-359.

8. Within-patient evolution of vaccine-derived polio virus


Martín, Javier, et al. “Evolution of the Sabin strain of type 3 poliovirus in an immunodeficient patient during the entire 637-day period of virus excretion.”Journal of Virology 74.7 (2000): 3001-3010.

 9. Hepatitis B within-patient evolution


Lim, Seng Gee, et al. “Viral quasi-species evolution during hepatitis Be antigen seroconversion.” Gastroenterology 133.3 (2007): 951-958.

10. Permissive mutations and the evolution of drug resistance in Influenza


Bloom JD, Gong LI, Baltimore D. Permissive Secondary Mutations Enable the Evolution of Influenza Oseltamivir Resistance. Science (New York, NY). 2010;328(5983):1272-1275. doi:10.1126/science.1187816.

11. Controversial experiments on H5N1 Influenza


Airborne transmission of influenza A/H5N1 virus between ferrets. Science. 2012 Jun 22;336(6088):1534-41. doi: 10.1126/science.1213362.
Herfst S1, Schrauwen EJ, Linster M, Chutinimitkul S, de Wit E, Munster VJ, Sorrell EM, Bestebroer TM, Burke DF, Smith DJ, Rimmelzwaan GF, Osterhaus AD, Fouchier RA.

12. Influential study on treatment to prevent HIV


Grant, Robert M., et al. “Preexposure chemoprophylaxis for HIV prevention in men who have sex with men.” New England Journal of Medicine 363.27 (2010): 2587-2599.

 13. HIV drug resistance in women in Africa who were treated to prevent mother-to-child transmission


Eshleman, Susan H., et al. “Nevirapine (NVP) resistance in women with HIV-1 subtype C, compared with subtypes A and D, after the administration of single-dose NVP.” Journal of Infectious Diseases 192.1 (2005): 30-36.

 14. Evolution of Acyclovir resistance in Varicalla-Zoster Virus


Morfin, Florence, et al. “Phenotypic and genetic characterization of thymidine kinase from clinical strains of varicella-zoster virus resistant to acyclovir.”Antimicrobial agents and chemotherapy 43.10 (1999): 2412-2416.


15. Soft and hard sweeps during evolution of drug resistance in HIV

Loss and recovery of genetic diversity in adapting populations of HIV. PLoS Genet. 2014 Jan;10(1):e1004000. doi: 10.1371/journal.pgen.1004000. Epub 2014 Jan 23.
Pennings PS1, Kryazhimskiy S2, Wakeley J3.

A reading seminar where every student reads, writes and contributes to the discussion in class

16 Jan

I remember reading seminars as follows: one student spends the entire week preparing for a powerpoint presentation, which often turns out to be stressful for the student and somewhat boring and uninformative for the audience. The other students only glanced over the paper and so any discussion quickly falls flat. I therefore decided to have multiple short presentations without powerpoint (less preparation, more fun to listen to, plus repetition is good for learning a skill). I also decided to use short writing assignments as homework to make sure that all students were prepared to contribute to the discussion in class. At the same time, I wanted to keep things manageable for everyone.

1. Learning to present: every student does multiple short presentations without powerpoint.

No powerpoint: I didn’t want students to spend too much time preparing a presentation. I believe that often, when students spend a lot of time preparing presentations, they focus too much on making powerpoint slides and not enough on informing the audience and telling a story.

Short presentations: Doing an engaging 45 minute presentation is extremely difficult, and a skill that most postdoc don’t have, so why do we use 45 minute presentations in our graduate seminars? I decided in stead to let each student do three 10 minute presentations.

Feedback: After each presentation the presenters got feedback (from the other students and myself), so that they could improve their presentation skills during the semester.

Easy listening: An added benefit of 10 minute presentations is that it is much easier for the audience. Each week started with three student presentations, one on the background and main question of the paper, one on the data and the results of the paper, and one on the conclusion and implications of the paper.

2. Practice writing: every student does a different writing assignment every week.

Graded homework each week: A paper discussion can only work if people have read the paper. If students don’t read, they may spend most of their energy to try to hide that they didn’t read (I know I was in that situation!). So even though I understand that life and research get in the way of reading, I really wanted to make sure that the students were prepared for the seminar. To do that, I made every student do a written assignment every week that would count towards their grade (unless they were presenting that week).

A different assignment for each student: I had a long list of assignments so that each week, many different assignments were done AND so that over the course of the semester each student did many different assignments. This guaranteed that the students read the paper, but each with a different question in mind.

There were several types of written assignments. Descriptive: 1. Describe the background and main question of the paper, 2. describe the data and the results, 3. describe the conclusions, 4. describe which virus the paper is about. Critical: 5. What is your opinion of the paper? 6. What do you think the authors should have done differently? 7. Play the devil’s advocate: why should the paper not have been published? Summaries: 8. Summarize the paper in your own words, as if writing to a friend, 9. summarize the paper using only the most common 1000 words of the English language, 10. summarize the paper in a graphical abstract, 11. summarize the paper in a tweet. Meta: 12. Who are the authors of the paper? 13. How often is the paper cited, do you think it is influential?

Short! Each written assignment could not be more than 150 words, to keep the workload manageable for me and for the students.

Surprisingly hard: Some of the assignments were harder than the others. Summarizing the paper using only the 1000 most common words from the English language turned out to be very hard, but some of the students did a great job (see here and here). The graphical abstract was also hard for some students, but others liked it just because it was so different from their usual work (see here and here). The ”devil’s advocate” writing assignment was always very interesting to read.

Easy: Grading the written assignments was quite easy. I simply gave a plus or minus for 5 categories (answered the question, scientific accuracy, clarity, grammar and word count).

Revisions allowed: After a request from a student, I decided that the students could redo any assignment where they had gotten less than 100% because I believe that feedback is most useful when it can be applied to a revision.

3. Promoting equity: thanks to the written assignments, every student could contribute to every class.

Everyone contributes: One of the nice things about the homework schedule with different assignments for everyone is that in class, I could ask each student about their homework. This way, each student contributed to the class, promoting equity, and the brief discussions of the homework assignments always let to questions from other students. Even if I didn’t ask, some students would volunteer to share information they found while they researched for their homework. For example, I remember someone remarking at the end of a presentation: “In your presentation, you said this result may be very important, but I found that the paper hardly has any citations even though it was published ten years ago, so I think it may not have been picked up by anyone.”

Sharing homework: I also encouraged the students to share their written assignments on the online forum we had for the class, so that the other students (and not just me) could read them. Sometimes they led to interesting forum threads. I also published some of the written assignments on my blog, after asking the students for permission. This way even more people could enjoy them.

Heb B study graphical abstract using paper and pens

6 Jan

One of the most fun things about teaching a grad seminar last semester was reading the homework assignments. Seriously!

Before I move on to the next semester (teaching genetics for undergrads), I wanted to share one more homework assignment. This one by Emily Chang, a graduate student in Scott Roy’s lab. The paper about viral quasispecies in Hep B was one of the harder ones for the students, but this graphical abstract very neatly sums up the main results. I also love that Emily used old fashioned paper and pens to make the abstract, knowing that using fancy drawing software isn’t needed to communicate science.

Graphical abstract by Emily Chang

Graphical abstract by Emily Chang

Student blog posts: Dangerous H5N1 strain made airborne

17 Dec

A few weeks ago This week my students wrote short essays about the infamous  Hersft et al 2012 paper on airborne Influenza A in ferrets. For months, this paper was not published (even though it was accepted for publication) because it was unclear whether the results should be published at all, for fear that terrorist groups would use it to create a dangerous flu strain (see here). In my class, all students read the same paper, but they each have a different assignment. 

Figure 4 of Herfst et al 2012 Science.

Figure 4 of Herfst et al 2012 Science.

Peter Manzo: My opinion

I thought this paper was a little slow but it was very interesting. The idea of an airborne virus has plagued mankind for centuries and according to the article, there is a possibility for viruses to mutate enough to become airborne. I liked how the article explained in detail what influenza is and its nomenclature. I thought it was interesting that the research group was able to produce an airborne virus but I do not understand why they would help a virus evolve to that state. I think the results are important but I wonder if the experiment will be redone.

Eduardo Lujan: The main conclusion of the paper

The main conclusion of the paper was that A/H5N1 influenza virus has the capability to become airborne transmissible in ferrets. Studies such as the one conducted in the paper are denoted as “gain of function” and the authors used this approach to genetically modify A/H5N1 virus and then used the modified virus during serial passage in ferrets. The authors concluded that four amino acid substitutions in the hemagglutinin protein and one mutation in the polymerase complex were all present in airborne-transmitted virus isolates. This paper is extremely relevant to health and medicine because it holds the potential to provide insight into a virus’s capacity to become airborne and cause explosive disease, and this information will allow scientists to begin developing therapeutics to alleviate such a situation. I do not believe that this paper will have an impact on current patients because the research carried out in this study did not lead to any novel treatments.

Graham Larue: The data that were used in the paper

In this paper, the authors wanted to investigate the possible mutations in avian influenza A/H5N1 which could lead to the possibility of airborne transmission between humans. In order to test this, the investigators performed targeted mutagenesis and serial virus passage in ferrets to determine whether the mutations made provide a sufficient substrate to allow for development of airborne transmission. The primary source of data for the experiment(s) came from throat and nasal swabs, as well as nasal washes which were then tested for viral load via end-point dilution in canine kidney cells. For the serial passage experiments, such samples were collected for each individual in the transmission chain. Viral quasi-species from each sample were characterized using 454 pyrosequencing, and viral genomes obtained using Sanger sequencing for experiment 4. In total, this paper used a variety of genetic, immunologic, molecular and bioinformatic (sequencing analysis) techniques to address the question of airborne transmission acquisition in avian influenza. There is scant detailed discussion about any of the individual analyses used in this paper, but clearly some amount of basic statistics must have gone into the generation of significance values and the like.