A recent paper (“Evaluating Trading Strategies”) on the hazards of backtesting in The Journal of Portfolio Management has been receiving a fair amount of attention lately, inspiring some folks to go off the deep end and argue that econometric analysis of investment strategies is always a bad idea. That’s a bit much for the simple reason that the only alternative to backtesting is blindly throwing money at markets without the benefit of perspective. Yes, backtesting can be dangerous when designed without best practices in mind. But in the grand scheme of investing, we’re all relying on one form of backtesting or another.
Some investors think they’re immune, but they’re fooling themselves. For instance, take the buy-and-hold investor who claims to invest for the long run in a simple stock/bond mix that’s periodically rebalanced and thereby shuns the evils of short-term trading and, by extension, backtesting. Sounds good, but there’s a good chance that the reason this investor embraces the strategy is because he’s aware of the encouraging track record in balanced portfolios over the long haul. A cursory glance at the Ibbotson data, for instance, has been known to motivate some to argue that you can do quite well by holding a stock/bond portfolio through time.
Perhaps, but the fact that you peeked at the historical record and decided that you’d like to jump on board this investment train is a conclusion based on… drumroll… a backtest! Okay, it’s an informal one if you’re reviewing someone else’s handiwork. But whether you’re diving into the nitty gritty details and writing your own code or relying on someone else to do the heavy lifting it’s all part of the same game: using history to make a decision on how to invest for the future.
That’s a broad brush, of course, and there are as many ways to backtest as there are stars in the sky. That leaves plenty of room for error. For an amusing review of how things can go horribly wrong, take a look at Jason Zweig’s column from last June in The Wall Street Journal (“Huge Returns at Low Risk? Not So Fast”).
The good news is that many of the obvious pitfalls to backtesting can be avoided. For instance, so-called look-ahead bias—calculating strategy returns based on information that wasn’t available in real time—can be be defused by using a lagged signal. In fact, there’s a laundry list of things to sidestep in the cause of developing a robust backtest. One of the more pernicious dangers is overlooking the fact that many profitable backtests can be generated randomly. Throw a bunch of trading signals into the data grinder and the software is sure to spit out wonderful results via strategies that no one in their right mind should use.
Ultimately one should start from a basis in economic logic—stocks tend to rise through time because earnings and dividend payouts tend to rise. Why? Because economies grow… most of the time. But there’s an element of backtesting here as well. You can dress it up with macro theory and all sorts of academic bells and whistles, but the fact that long-term charts of US gross domestic product and the S&P 500 reflect positive slopes inspires more than a few investors to assume that the future will resemble the past.
Yes, we are all slaves to history in some degree, which carries quite a lot of risk. That’s a reminder that the details are crucial in the delicate art/science of how you study the past and interpret the signals (and the noise). Two researchers with two very different views of the world and working models can look at the same data set and come up with very different results on what works vs. what doesn’t. But that’s not an argument for banishing backtests; rather, it’s an observation that tells us that we can’t spend too much time thinking about what could wrong.
Fortunately, analysts have been studying the traps for decades. True, you can’t sidestep all the risks that bedevil backtests, although there’s no excuse for making the common mistakes that burden many (most?) backtesting results.
It’s easy to badmouth backtesting, and to a degree this is a healthy dialogue. There are no silver bullets no matter how sophisticated your model. But there’s really no alternative to looking in the rearview mirror. The question, then, boils down to how you’re interpreting history? Results will vary, as they say. The key is figuring out why they vary and what constitutes informed design.
Pingback: We’re All Backtesters Now | XLNtrading
Pingback: The Whole Street’s Daily Wrap for 3/4/2015 | The Whole Street
Normally don’t leave comments but your post struck a chord. And in fact, I’ve had this exact discussion with many separate colleagues recently.
It’s interesting to consider the P (real-world) vs Q (risk neutral) quant ways of thinking: In the Q, everything is about accurate calibration to option prices. If you can’t price match the vanilla option prices, then you may as well throw your model out from the start. And then, the next day (or week), you recalibrate to match prices exactly again, parameter stability be damned. From a P perspective, this would be considered the ultimate form of model ‘overfitting’.
There’s also another divide amongst quants: those that claim that history is just one single realised path of a random variable and that the only true method of analysis is to work directly with simulated ‘full’ distributions. And then there’ew those that say that the chance of mis-specifying the distribution is so large that it’s better to work with history which at least represents a realisation of the true distribution.
It speaks to the underlying personality of the researcher I guess.
Also, very true about the fact that the models we use today are only in use because they do a good job of re-creating history.
Thanks,