Tag Archives: Haldane’s Sieve

Thoughts on arXiv and journals

9 Jul

One of the best things about working at Stanford is having lunch outside with my colleagues almost every day. Last Friday it was fairly cold (70 degrees orso, 20°C) but we are a tough bunch and we were sitting outside.

One of the newer people in the lab asked to the others: “do you publish your manuscripts on the arXiv?” What followed was a brief discussion of the pros and cons of publishing on the arXiv before a paper is published in a journal. Here is my summary.

Screen shot 2013-07-09 at 2.49.11 PM

Pros and cons of publishing on the arXiv

Pros

1. Science goes faster when we share our results faster.

2. Published papers will be better if more people can give feedback early on.

3. There is some evidence (though not from a randomized trial) that papers get cited more when they are first published on the arXiv.

4. Getting your paper “out there” before it is accepted by a journal takes away some of the stress of getting the paper accepted by a journal. Others can already see what you’ve done, and an arXiv-ed paper looks much better on your CV than “in preparation.”

5. In quantitative biology, the arXiv is cool and you will look like a modern 21st century scientist if you publish on the arXiv. But don’t try to impress a physicist with your new-found arXiv-fondness, because they already used the arXiv before most current graduate students were born. If you go for hip, consider publishing your preprint on Figshare, because they allow you to keep track of traffic, and PeerJ Preprints is another new option.

6. If you’re in evolutionary biology, you can benefit from exposure on Haldane’s Sieve if you publish on arXiv (or another preprint server).

Cons

1. The paper may still change a lot and you cannot remove the arXiv-ed version (though you can add a newer version, and I think it is unlikely that anyone looks at an old version).

2. Some journals don’t like to publish arXiv-ed papers, see this list: http://en.wikipedia.org/wiki/List_of_academic_journals_by_preprint_policy

3. If many people read the arXiv-ed version, they may not bother reading the improved journal-version.

Honestly, I am not too convinced of these cons.

So should do away with publishing in peer-reviewed journals?

I don’t think so. Despite everything that is wrong with journals, I think they are very useful.
Ask yourself: when was the last time you really took the time to read through a paper by someone you didn’t know?
Right, I think that may have been when you were reviewing a paper! And chances are that you were reviewing that paper because an editor asked you. There is not yet a system – outside of journals – that makes sure that a paper gets read & scrutinized by at least a few people. When I tried to publish a somewhat controversial paper on HIV last year, I was annoyed with the peer review system, because I felt it was unfair to a newby in the field. But without the review system, chances are that my paper would have been ignored entirely. If it wasn’t for journals, how would a person who is not yet known in the field get the attention of the community?

Editors are important hubs in our scientific community

Of course, there are reviewers who do not take their task seriously, and there are scientists who do take time to read papers by unknown scientists even if they are not reviewing, but I bet that both are rather small minorities. I like to review papers, I am happy that my papers get reviewed, and I think that the editors who organize it all are important hubs in our scientific community. We shouldn’t do away with that!

New video on slavemaking ants

27 Mar

As I announced a few weeks ago, I have been working on a new video on my work on slavemaking ants. It is now ready and online!

In this video, we talk about our research on slavemaking ants and their hosts (slaves). The slavemakers are of one species (P. americanus) and the hosts of another species (T. longispinosus). Host ants can be captured by the slavemaker ants, and these captured ants (slaves) normally work for the slavemaker queen. But recently, it was found that they sometimes kill slavemakers (Achenbach and Foitzik 2009 and Pamminger et al. 2013). It is unclear why the slaves do this, because they probably cannot reproduce.

The video is based on the paper: “Oh sister, where art thou? Indirect fitness benefit could maintain a host defense trait” by Tobias Pamminger, Susanne Foitzik, Dirk Metzler and myself, which can be found here: http://arxiv.org/abs/1212.0790. Earlier, I wrote a blog-post about this paper for Haldane’s Sieve.

Susanne Foitzik, who is a professor in Mainz (and previously in Munich) and her students and colleagues have been working on this slavemaker-host system for many years. Another video of our work is here: Raiders from the sky.

The music for the video was taken from the Free Music Archive.

Palmer et al. find that HIV evolution is not so fast

18 Feb

Usually, I use this blog to write about how I do my work, but today I write about science! The blog is cross-posted on Haldane’s Sieve, a website that provides a “feed of preprints in the fields of evolutionary and population genetics.”

Last week, a group of people from Oxford University published an interesting paper on the ArXiv. The paper is about using genealogical data (from HIV sequences), in combination with cross-sectional data (on patient and HIV phenotypes) to infer rates of evolution in HIV.

My conclusion: the approach is very interesting, and it makes total sense to use genealogical data to improve the inference from cross-sectional data. In fact, it is quite surprising to me that inferring rates from cross-sectional data works at all. However, in a previous paper by (partly) the same people, they show that it is possible to infer rates from using cross-sectional data only, and the estimates they get are very similar to the estimates from longitudinal data. The current paper provides a new and improved method, whose results are consistent with the previous papers.

The biological conclusion of the paper is that HIV adaptation is slower than many previous studies suggested. Case studies of fast evolution of the virus suffer from extreme publication bias and give the impression that evolution in HIV is always fast, whereas cross-sectional and longitudinal data show that evolution is often slow. Waiting times for CTL-escape and reversion are on the order of years.

The paper: 

Integrating genealogical and dynamical modelling to infer escape and reversion rates in HIV epitopes, Duncan Palmer, John Frater, Rodney Philips, Angela McLean, Gil McVean, http://arxiv.org/abs/1302.1098

The previous paper: 

Modelling the evolution and spread of HIV immune escape mutants.
Fryer HR, Frater J, Duda A, Roberts MG; SPARTAC Trial Investigators, Phillips RE, McLean AR.
http://www.plospathogens.org/article/info%3Adoi%2F10.1371%2Fjournal.ppat.1001196

1. What rates are they interested in?

The rates of interest here are the rate of escape from CTL pressure and the rate of reversion if there is no CTL pressure.

When someone is infected with HIV, the CTL response by the immune system of the patient can reduce the amount of virus in the patient. CTL stands for cytotoxic lymphocytes. Which amino-acid sequences (epitopes) can be recognized by the host’s CTL response depends on the HLA genotype of the host.
Suppose I have a certain HLA genotype X, such that my CTLs can recognize virus with a specific sequence of about 9 amino acids, let’s call this sequence Y. To escape from the pressure of these CTLs, the virus can mutate sequence Y to sequence Y’. A virus with sequence Y’ is called an escape mutant. The host (patient) with HLA X is referred to as a “matched host” and hosts without HLA X are referred to as “unmatched.” The escape mutations are thought to be costly for the virus.
So, for each CTL epitope there are 4 possible combinations of host and virus:
1. matched host and wildtype virus (there is selection pressure on the virus to “escape”)
2. matched host and escape mutant virus
3. unmatched host and wildtype virus
4. unmatched host and escape mutant virus (there is selection pressure on the virus to revert)

The question is “how fast does the virus escape if it is in a matched host and how fast does it revert if it is in an unmatched host?”

2. Why do we want to know these rates?

First of all, just out of curiosity, it is interesting to study how fast things evolve – it is surprising how little we know about rates of adaptive evolution. Secondly, because escape rates are relevant for the success of a potential HIV vaccine, if escape rates are high, then vaccines will probably not be very successful.

3. What are cross-sectional data and how can we infer rates from them?

Cross-sectional data are snap-shots of the population, with information on hosts and their virus. Here, it is the number of matched and unmatched hosts with wildtype and escape virus at a given point in time.

So how do these data tell us what escape rates and reversion rates are? Intuitively, it is easy to see how very high or very low rates would shape the data. For example, if escape and reversion would happen very fast, then the virus would always be perfectly adapted: we’d only find wildtype virus in unmatched hosts and only escape mutant virus in matched hosts. Conversely, if escape and reversion would be extremely slow, than the fraction of escape mutant virus would not differ between matched and unmatched hosts. Everyone would be infected with a random virus and this would never change.
The real situation is somewhere in between: the fraction of escape mutant virus is higher in matched hosts than in unmatched hosts. With the help of an standard epidemiological SI-model (ODE-model) and an estimate of the age of the epidemic, the fraction of escape mutant virus in the two types of hosts translates into estimates of the rates of escape and reversion. In the earlier paper, this is exactly what the authors did, and the results make a lot of sense. Rates range from months to years, reversion is always slower than escape, and there are large differences between CTLs. The results also matched well with data from longitudinal studies. In a longitudinal study, the patients are followed over time and evolution of the virus can be more directly observed. This is much more costly, but a much better way to estimate rates.

4. Why are the estimates from cross-sectional data not good enough?

Unfortunately, the estimates from cross-sectional data are only point estimates, and maybe not very good ones. The problem is that the method (implicitly) assumes that each virus is independently derived from an ancestor at the beginning of the epidemic. For example, if there are a lot of escape mutant viruses in the dataset, then the estimated rate of escape will be high. However, the high number of escape mutant virus may be due to one or a few escape events early on in the epidemic that got transmitted to a lot of other patients. It is a classical case of non-independence of data. It could lead us to believe that we can have more confidence in the estimates than we should have.

5. Genealogical data to the rescue!

Fortunately, the authors have viral sequences that provide much more information than just whether or not the virus is an escape mutant. The sequences of the virus can inform us about the underlying genealogical tree and can tell us how non-independent the data really are (two escape mutants that are very close to each other in the tree are not very independent). The goal of the current paper is to use the genealogical data to get better estimates of the escape and reversion rates.

A large part of the paper deals with the nuts and bolts of how to combine all the data, but in essence, this is what they do: They first estimate the genealogical tree for the viruses of the patients for which they have data (while allowing for uncertainty in the estimated tree). Then they add information on the states of the tips (wildtype vs escape for the virus and matched vs unmatched for the patient), and use the tree with the tip-labels to estimate the rates. This seems to be a very useful new method, that may give better estimates and a natural way to get credible intervals for the estimates.

The results they obtain with the new method are similar to the previous results for three CTL epitopes and slower rates for one CTL epitope. The credible intervals are quite wide, which shows that the data (from 84 patients) really don’t contain a whole lot of information about the rates, possibly because the trees are rather star-shaped, due to the exponential growth of the epidemic. Interestingly, the fact that the tree is rather star-shaped could explain why the older approach (based only on cross-sectional data) worked quite well. However, this will not necessarily be the case for other datasets.

Question for the authors

Do you use the information about the specific escape mutations in the data? Certainly not all sequences that are considered “escape mutants” carry exactly the same nucleotide changes? Whenever they carry different mutations, you know they must be independent.