Archive | March, 2023

Scientist spotlight: supervised and unsupervised methods for microbiome data analysis with Dr Nandita Garud 

7 Mar

I got to know Nandita Garud when she was a PhD student in the biology department at Stanford and I was a postdoc there. While we were in the same lab, we got to collaborate on two papers: one about population genetics and drug resistance evolution and one about rats in New York City. After finishing her PhD, Nandita worked at UCSF as a postdoc and then took a job as an assistant professor at UCLA. You can read more about her interesting work on the microbiome, fruit flies and other topics on her website. I asked her about a recent paper on using supervised and unsupervised methods to analyze microbiome data. 

Image: Headshot of Dr Nandita Garud, assistant professor UCLA
Headshot of Dr Nandita Garud, assistant professor UCLA

Pleuni: Hi Nandita! Thanks for taking the time to chat with me! Can you tell me in a few sentences what your job is?

Nandita: Hi Pleuni! Thank you so much for inviting me to chat about my work. I am an assistant professor in the Department of Ecology and Evolutionary Biology at UCLA. My research is on understanding the evolutionary dynamics of natural populations, currently with a focus on the human microbiome, but I also work on Drosophila and other organisms!  My research group (or, ‘lab’) consists of several PhD students that perform computational work to understand how natural populations evolve. 

Pleuni: So, you consider the community of microbes that live in my intestinal tract as a natural population, is that right? And they evolve? 

Nandita: That’s correct. I consider populations that live outside a test tube in the lab to be natural populations. Interestingly, gut microbiota can evolve on even 1-day timescales, even in the absence of a selective pressure like antibiotics!

Pleuni: I saw that you published a paper about supervised and unsupervised methods for background noise correction in human gut microbiome data. Could you explain what the human gut microbiome is? And why you need background noise correction for it?

Nandita: The human gut microbiome is a complex community that is composed of hundreds of microbial species coexisting and interacting with one another. The human microbiome is known to play an essential role in health, and changes in the microbiome are associated with numerous diseases like diabetes, obesity, and inflammatory bowel disease. Being able to predict disease status from the human microbiome is important for helping individuals diagnose any illnesses they may have. One major complication, however, is that technical variables, such as how the DNA was extracted from the sample, can introduce noise in the data, making it harder to predict human phenotypes. So, background noise correction is an important approach for addressing this data heterogeneity so that more reliable predictions can be made. 

Pleuni: Thanks! In the new paper from your lab, you compare supervised methods (which are currently standard for noise correction) and unsupervised methods (which have not been applied to microbiome data). What is the difference here between supervised and unsupervised methods?

Nandita: Supervised methods are ones where a machine is shown labeled data and is trained to understand the differences between data classes. Unsupervised methods are ones where the machine needs to figure out on its own what groupings are present in the data. We use an unsupervised approach because we don’t always know what sources of noise contribute to variation in the data. 

Pleuni: Okay, thanks! So, I imagine something like this: If microbial species A is always 2x as abundant in samples that were sequenced with machine X vs machine Y, then we can correct by changing the abundance of species A so that it matches between the two machines? Is that what’s happening? 

Nandita: Yes, but we aren’t explicitly adjusting the abundances, rather, throwing away variation due to noise. 

Pleuni: Does this mean that you do a dimension reduction method first and then throw away dimensions? 

Nandita: Exactly — we do PCA (principal component analysis) and then throw away the first PCs (principal components) because they usually are correlated with noise. We do run the risk of throwing away signal too, but that’s the tradeoff in an unsupervised approach. But when we compare this unsupervised approach to the standard supervised approaches, it can work just as well in many scenarios! And the good thing is that this way we can correct for unidentified confounders. 

Pleuni: Cool 😎 Thank you for explaining all of this, Nandita! 

I have one more question. What is something you like to do when you are not doing science? 

Nandita: I enjoy taking walks with my family and enjoying the outdoors in Los Angeles! 

Pleuni: Thank you Nandita! 

Here is a link to the paper:

The website of the Garud lab: