This is an excellent list of how to evaluate scientific evidence. Almost all the issues are applicable to evaluating whether a manager or investment approach will demonstrate outperformance in the future. To wit:
- Differences and change cause variation – that outperformance is subject to a very high level of sheer randomness.
- Bias is rife – particularly given the huge rewards to finding (or claiming to find) an edge.
- Bigger is usually better for sample size – a 3-, 5-, or even 10-year track record is essentially meaningless.
- Correlation does not imply causation – self-explanatory.
- Regression to the mean can mislead – just when a manager or effect looks great (or terrible) performance will become much more typical.
- Extrapolating beyond the data is risky – it may have worked before, but the future is out-of-sample.
- Beware the base-rate fallacy – given the rarity of true alpha, historical alpha is probably meaningless.
- Controls are important – and you generally don’t have a control in investing.
- Randomization avoids bias – and again you generally don’t have this in investing.
- Seek replication, not pseudo replication – rolling periods are not separate tests because of the overlapping data.
- Scientists are human – and product providers are even more prone to bias given the enormous incentives.
- Significance is significant – if you try 20 things, one, on average, (by sheer randomness) will appear to be significant at the 5% level.
- Separate no effect from non-significance – the sample sizes (track records) are probably too small to find the effect even when the manager or strategy in fact does have true alpha.
- Effect size matters – differentiate between results that are statistically significant and those that are economically meaningful (generally not a problem in investing, if it is significant it will generally be meaningful)
- Study relevance limits generalizations – the market/economy/etc. isn’t the same now as it was in the historical data.
- Feelings influence risk perception – self-explanatory.
- Dependencies change the risks – we rarely have independent factors.
- Data can be dredged or cherry picked – and in our industry it will be not just dredged or cherry picked, but tortured to death.
- Extreme measurements may mislead – lots of things led to the performance, not just the factor you are focused on.