Tag Archives: scientist spotlight

Scientist spotlight: supervised and unsupervised methods for microbiome data analysis with Dr Nandita Garud 

7 Mar

I got to know Nandita Garud when she was a PhD student in the biology department at Stanford and I was a postdoc there. While we were in the same lab, we got to collaborate on two papers: one about population genetics and drug resistance evolution and one about rats in New York City. After finishing her PhD, Nandita worked at UCSF as a postdoc and then took a job as an assistant professor at UCLA. You can read more about her interesting work on the microbiome, fruit flies and other topics on her website. I asked her about a recent paper on using supervised and unsupervised methods to analyze microbiome data. 

Image: Headshot of Dr Nandita Garud, assistant professor UCLA
Headshot of Dr Nandita Garud, assistant professor UCLA

Pleuni: Hi Nandita! Thanks for taking the time to chat with me! Can you tell me in a few sentences what your job is?

Nandita: Hi Pleuni! Thank you so much for inviting me to chat about my work. I am an assistant professor in the Department of Ecology and Evolutionary Biology at UCLA. My research is on understanding the evolutionary dynamics of natural populations, currently with a focus on the human microbiome, but I also work on Drosophila and other organisms!  My research group (or, ‘lab’) consists of several PhD students that perform computational work to understand how natural populations evolve. 

Pleuni: So, you consider the community of microbes that live in my intestinal tract as a natural population, is that right? And they evolve? 

Nandita: That’s correct. I consider populations that live outside a test tube in the lab to be natural populations. Interestingly, gut microbiota can evolve on even 1-day timescales, even in the absence of a selective pressure like antibiotics!

Pleuni: I saw that you published a paper about supervised and unsupervised methods for background noise correction in human gut microbiome data. Could you explain what the human gut microbiome is? And why you need background noise correction for it?

Nandita: The human gut microbiome is a complex community that is composed of hundreds of microbial species coexisting and interacting with one another. The human microbiome is known to play an essential role in health, and changes in the microbiome are associated with numerous diseases like diabetes, obesity, and inflammatory bowel disease. Being able to predict disease status from the human microbiome is important for helping individuals diagnose any illnesses they may have. One major complication, however, is that technical variables, such as how the DNA was extracted from the sample, can introduce noise in the data, making it harder to predict human phenotypes. So, background noise correction is an important approach for addressing this data heterogeneity so that more reliable predictions can be made. 

Pleuni: Thanks! In the new paper from your lab, you compare supervised methods (which are currently standard for noise correction) and unsupervised methods (which have not been applied to microbiome data). What is the difference here between supervised and unsupervised methods?

Nandita: Supervised methods are ones where a machine is shown labeled data and is trained to understand the differences between data classes. Unsupervised methods are ones where the machine needs to figure out on its own what groupings are present in the data. We use an unsupervised approach because we don’t always know what sources of noise contribute to variation in the data. 

Pleuni: Okay, thanks! So, I imagine something like this: If microbial species A is always 2x as abundant in samples that were sequenced with machine X vs machine Y, then we can correct by changing the abundance of species A so that it matches between the two machines? Is that what’s happening? 

Nandita: Yes, but we aren’t explicitly adjusting the abundances, rather, throwing away variation due to noise. 

Pleuni: Does this mean that you do a dimension reduction method first and then throw away dimensions? 

Nandita: Exactly — we do PCA (principal component analysis) and then throw away the first PCs (principal components) because they usually are correlated with noise. We do run the risk of throwing away signal too, but that’s the tradeoff in an unsupervised approach. But when we compare this unsupervised approach to the standard supervised approaches, it can work just as well in many scenarios! And the good thing is that this way we can correct for unidentified confounders. 

Pleuni: Cool 😎 Thank you for explaining all of this, Nandita! 

I have one more question. What is something you like to do when you are not doing science? 

Nandita: I enjoy taking walks with my family and enjoying the outdoors in Los Angeles! 

Pleuni: Thank you Nandita! 

Here is a link to the paper: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1009838

The website of the Garud lab: https://garud.eeb.ucla.edu/

Scientist spotlight: meet Dr Sabah Ul-Hasan!

28 Apr

Dr Ul-Hasan (they/them) is a postdoc and lecturer in bioinformatics under Dr Andrew Su and Dr Dawn Eastmond at Scripps Research, doing biocuration and automated data integration work within the Gene Wiki project of Wikidata. They received their PhD in Quantitative & Systems Biology from UC Merced, their Master’s in Biochemistry from the University of New Hampshire and their BSc degrees (3 majors! Biology, Chemistry, and Environmental & Sustainability Studies) from the University of Utah. Sabah is involved in what feels like a thousand different activities related to science, research, coding, outreach, conservation, environmental justice and other things. 

I got to know Sabah a couple of years ago when I visited UC Merced and then started following them on twitter. One thing I really love about them is how they don’t limit themselves to just doing one thing.They are ambitious and radical. They founded the Biota project to connect underrepresented communities with nature. They are a filmmaker (see here)! They volunteer for The Carpentries, and they started the venom-microbiome research consortium. They organize workshops, speak at events, teach classes and do many other things. 

In my opinion, too few scientists use their platform to fight for justice and to share their passion and knowledge. At the same time, many PhD students and postdocs and even assistant professors are shy about taking a stance, thinking that they would speak up louder (about science or justice or both) when they are more senior. But Sabah proves that you don’t have to be a tenured professor to make a difference in science (they have more than 8000 followers on twitter, just saying). 

Pleuni: Hi Sabah, thanks for taking the time to answer my questions! Could you tell us in a few sentences how you became interested in data science? 

Sabah: One of my dissertation chapters involved data that was over 100 years old. I know this isn’t a new concept for anyone doing paleo research. I was also well-familiar with “old” data through all the climate change reports that have come up in the public over the years. 

However, to directly work with data like that I realized there were so many more questions I wanted to ask people from 100 years ago. That then got me wondering, “How can I contribute to research in a way that can be sustainable 20, 50, or even 5 years from now?”. 

My interest in data science thus came from a position of wanting to be part of something bigger in terms of the infrastructure for how we can sustain the science of today and tomorrow. 

Pleuni: How did you start learning coding skills? Was it hard for you to learn? 

Sabah: I was first introduced to R during my (Biochemistry) Master’s at the University of New Hampshire in 2013. I sat-in on a casual meeting among graduate students and postdocs and truly had no idea what anyone was talking about. 

The data analysis section of my MSc thesis ended up utilizing Excel to make bar charts. In retrospect, I see how much faster I could’ve done the analyses if I took the time to learn coding. When I began the doctoral program at UC Merced in January 2015, I knew coding was a skill I wanted to learn and so I did through classes and workshops. 

Now it’s my job as a postdoctoral scholar and lecturer for bioinformatics, and I still sometimes struggle with basic concepts. The difference between then and now is I’m a lot better at admitting when I don’t know something, how to ask a question for what I need to learn, and where to go to find that answer. 

I’m not sure anyone who does bioinformatics considers themselves an expert, but perhaps the expertise lies within the ability to problem solve especially when it is difficult or can feel overwhelming. In sum, the sooner you can confront your fears the better! Don’t let them freeze you. Believe in your ability to constantly learn and grow, even when you’re a titled expert!

Pleuni: For your paper that appeared in Plos One in 2019, you studied the diversity of microorganisms (including archaea, bacteria and eukaryotes) in seawater and sediment in three different locations. It sounds like a complex dataset to work with. 

Community ecology across bacteria, archaea and microbial eukaryotes in the sediment and seawater of coastal Puerto Nuevo, Baja California

Sabah: It’s funny to only be two years out from that publication and already think of so many things I would’ve done differently. I guess that’s growth! 

I attribute a lot of credit and thanks to the co-authors of the paper and those in the acknowledgements. It came a long way from when I first drafted it to the final publication form, and posting it on bioRxiv also helped a great deal in soliciting feedback. 

What I think really makes a difference is the transparency of that research and associated code, especially in reference to data clean-up (which is the bulk of the analysis work, in personal opinion). I’ve since received several inquiries from people for their own work and to me that feels great to know that it can serve as something people can apply to their own research in making things a little easier. 

I also think it’s important we as scientists specify the microbes we’re investigating in any ‘microbial community’ -type paper. Many of the amplicon and metagenomics studies I see really focus on bacteria or fungi, which is absolutely fine but that isn’t a comprehensive microbial community for what many of the titles for these papers tend to imply. In this study, too, we focus on whatever microbial groups we identified solely through 16S and 18S. We need to be better at saying what the data is rather than wordsmithing for a nice story. That will help the next group build upon those gaps for something stronger next time, and overall our intent as scientists is to always have research be advancing further and further. Right? 

Pleuni: You used R for your data analysis (but also other software such as QIIME2). What do you like or not like about R? Could you imagine doing a paper like this one without R?

Sabah: Using wrappers such as QIIME2 and mothur are great for people who want to do an analysis of a microbial dataset and then perhaps never touch one again. For me, I found myself continuously asking a lot of “Why?” and wanting to dig deeper on the fundamentals behind what the software I was using. In the end, R took more time to learn short-term but made more sense to me of what was happening each step of the way in the analysis. It was also a good way to affirm my results in trying different avenues and seeing the same output. 

What I learned from putting together the paper is it’s not about finding the ‘right’ or ‘wrong’ answer, it’s about finding an answer that is logical and as unbiased as possible. A lot of the time we have these hypotheses we ‘prove’ through confirmation bias. To me, code (when done with intention) is a way to step outside of ourselves and see what the data is telling us rather than what we want the data to say — and that’s where the interesting science lives.

This publication, for example, wasn’t exactly what we were wanting to see. It’s actually a failed attempt at sequencing the venom microbial community of Californiconus californicus, which was the focus of my dissertation (venom microbiomes), due to too much host contamination of the tissues we sampled for that region of Puerto Nuevo. So, what do we do? Do we call it all a wash? There was a lot of thought, time, and resources that went into that work. 

I had sampled the sediment and water of the area, along with some generic chemistry tests, to see if the venom microbial community was largely specialized to the snail venom glands or from the surrounding environment (they burrow in the sand). That data was still usable, had good replication, and we didn’t know anything about the microbial community of Puerto Nuevo before that point. Ah-ha! A different story than we were thinking, but still a valuable one. Let the data tell you, don’t misconstrue the data to fit your narrative. 

R, and all the programming languages I’ve learned thus far, have helped me learn that.

Pleuni: On your twitter profile, you list many interests, such as advocacy, consulting, data visualization. Can you tell us a bit about your different interests? Are these things linked to each other?

Sabah: Well… haha. The link is that, at heart, I’m a bit of a troublemaker. It’s the nature of a scientist to ask a lot of questions, and asking too many questions can often get us into trouble! I likewise enjoy being asked a lot of questions, and hope to always maintain humility in learning just as much from high school students as I do from tenured professors. 

I wanted my Twitter profile and bio to emulate that duality of being both a ‘credible academic’ while also pushing back on what we define as ‘the norm’. I disagree with the idea that a science expert needs to possess a PhD (or some other form of higher education certification) because of the privilege and whiteness involved, but I do also benefit from it after completing the process and there is of course also danger in believing ‘just anyone’ on the internet. And I love learning and helping, which are really the only drivers behind all my many interests.

In my view, the most important quality in being a scientist is being approachable. If only a few people can understand the work you do, then what’s the point? That’s why I’m on Twitter, and also as a way to keep myself grounded, especially learning from moments of being called out (which does happen from time to time). I’d also say my family keeps me in check, as I’m one of the few with a science background. I have one cousin on my Mom’s side with a Ph.D. and that’s it for our extended family of over 100 people (South Asian families are big). Being a good scientist is just as much about humanity as it is about the basic research. I think only good things can come from staying tuned into the reality of the world around us, even though it can feel like a lot to balance.

Pleuni: Do you have any advice for the bio and chem Master’s students in my Data Science class? 

Sabah: My advice is to just go for it! 

This past Fall I taught a bioinformatics course to (mainly) graduate students and it was an adventure for all of us. It was my first time as a full instructor for a course (versus a teaching assistant), during COVID no less, and it was also the first time many students in the course were getting into bioinformatics. 

At the end, it was clear to me that student progress in the course wasn’t about who knew how much at the start but rather about showing up with enthusiasm and simply trying. That went both ways for me as the instructor giving lectures my all as well as for the students and their performance. And life happens! I had to cancel one of the days due to personal life things, and that’s okay. Be good to yourself when you need to and also don’t hold yourself back. And be good to others, too. We really never know what someone else may be experiencing behind the scenes for them to be flakey or on edge, and the more we can find the good in each other the better we can focus on doing the good science. 

On that note, I can’t express enough how much of a difference it’s made in my life to work for or alongside with even just one considerate person. As they say, “You are what you eat.”. My PhD co-advisors (Dr Tanja Woyke and Dr Clarissa Nobile) and my current PIs (Dr Su and Dr Eastmond) are truly outstanding people. They have so many stresses in their own careers and lives, and they still somehow show up with kindness and professionalism every day. And they also believe in me to do good work, even when I’ve had a bad week (or month!). That trust really goes such a long way when you’re underrepresented in your field, and often used to being discouraged and/or people expecting very little of you. Being entrusted to teach a course at a renowned research institute directly out of my PhD, for instance, is a big reason why I chose this position in knowing that my voice was heard and respected. That’s been true throughout, and makes it much easier to show up with my best foot forward even on the tough days.

Tying it all together, so many times I’ve got myself stuck because I see others who are ahead of me, doing better than me, and/or with access to more resources than me. One truth we can all agree upon is that life is unfair, and while hopefully it will become equitable over time through our own efforts to create change the fact is that life is still happening in the meantime. No one will help you as much as you can help yourself, and the moments where I’ve been able to just sit down and see something through is how I’ve realized more and more just how much more ability I have than I thought. You’re much more capable than you give yourself credit! It’s super cheesy, but it’s very true. And feel free to reach out any time!

Pleuni: Thanks for answering my questions, Sabah! So much here that resonates with me, including one of the last things you said, that you realized that you have more ability than you thought. This happens to me too! As just one example, just over a year ago, I didn’t think I could learn Machine Learning, but now I am even teaching it. Not that I am suddenly an expert, but I can do it and it is no longer scary. 

I look forward to seeing all the science, art, and justice-related projects you will be doing in the future! 

Links

Sabah Ul-Hasan Google Scholar profile 

Sabah Ul-Hasan, PhD Twitter Profile (@sabahzero)

Meet Simone Webb, Bioinformatics and Immunology PhD student

2 Dec

Picture1

I am spotlighting scientists who code for my students who are learning to code in Python. Today, I’ve chatted with Simone Webb from the UK. Simone Webb is a PhD student in the group of Professor Haniffa at Newcastle University in the UK.

Pleuni: Hi Simone, how did you get into coding?

Simone: I got into coding during my undergraduate degree, where I took some compulsory statistics and intro to bioinformatics courses.

To be honest, I struggled with it a lot! These courses remain my worst grades during university. However, there was something about it that drew me to it. The maths-based logic of it all really appealed to me at a time where the bio-related content I was learning seemed a lot more uncertain and up for debate. I’m not a natural at it by any means.

I liked how it felt to get an answer correct during our tutorials and stuck with it.

By the time I got to my undergraduate thesis, I realized that my real interest lay in microbiology and bioinformatics. The projects on offer for my thesis didn’t have massive diversity in these fields, so I crossed my fingers and applied for the project led by our first-year bioinformatics tutor – I got in! From then onwards, it’s fair to say that I would always choose coding over wet lab work. My thesis project was purely bioinformatics and I had a very encouraging and hands-on supervisor who was patient with me and taught me a lot to do with coding technique, method and reasoning. After I graduated I knew I wanted to keep coding, whether in research or a non-academic role.

Pleuni: What is your current job or project?

Simone: I’m currently studying for a PhD in bioinformatics and immunology. I now use coding (in both R and python languages) to analyze sequencing data. In this work, code is able to help us understand exactly what cells are present in both healthy and disease tissue, and helps us look further into the role these cells could be playing.

Pleuni: Do you have any advice for students who are starting to learn coding skills?

Simone: If you have an interest in anything bioinformatics related, my advice is to seek out a role model and be brave – ask for their advice and see what you can learn from their experiences! Also, there are active online communities for women in STEM, women who code and people who are Black in academia. Reach out if any of these groups relate to you and know that you are not alone

You can find Simone on twitter under her twitter handle @SimSci9 !

Scientist spotlight : Jazlyn Mooney, PhD student UCLA

25 Jan

jazlynmooneyJazlyn Mooney grew up in Albuquerque New Mexico. She went to high school and college there too (Eldorado High School and University of New Mexico).

Sketching science created a lasting interest

I became interested in science in middle school. I had a science teacher, Mr. Pecknik, who made us draw everything we learned about (from central dogma to phylogenies) for class. So we kept a sketch book for our science class and I thought it was super cool.”

Not “cut out for MD/PhD” ?

Becoming a researcher didn’t always seem possible for Jazlyn. One summer, when she was an undergrad, she participated in an MD/PhD prep program. At the end of the summer, her summer advisor told her that she wasn’t cut out to be MD or PhD! Fortunately, she didn’t listen to him but instead listened to her other undergrad advisor, her family and herself and decided to continue her path to become a scientist! She did research as an undergraduate and then applied to PhD programs.

The history of Latin American populations

Jazlyn is now a PhD student at UCLA in the lab of Dr. Kirk Lohmueller and works to better understand the history of human populations using genetic data. She recently published a paper entitled: “Understanding the Hidden Complexity of Latin American Population Isolates.” In this paper she showed how Costa Rican and Colombian people are descended mostly from European males and Amerindian females, and a small number of African individuals.

The field that uses genetic data to understand the history of populations is called “population genetics”. Jazlyn got interested in population genetics when she was an undergrad and got an opportunity to do research with Dr Jeff Long.

Learning new things and presenting at meetings

Jazlyn loves learning new things and her favorite part of being a researcher is that it allows her to learn new things and create new knowledge. Jazlyn has presented her work at many conferences including : University of Chicago Research Forum, the meeting of the American Society for Human Genetics, the Bay Area Population Genomics meeting at UC Santa Cruz in 2018.

Links

Link to paper about the history of people in Costa Rica and Colombia

Link to a free “preprint” version of the same paper

Tacos, R and Twitter

Jazlyn’s favorite coding language: R

Jazlyn’s favorite food: Tacos

Jazlyn’s Twitter handle: @Jazlyn_Mooney