3 Common Backtesting Traps With Easy Solutions

Backtests have become the weapon of choice for rationalizing various forms of tactical asset allocation, which has become increasingly popular as a risk-management tool since the 2008 crash. The hazards of backtesting—studying how a strategy performed in the past–are well known, which leads some folks to shun the concept entirely. But that’s going too far.

In some respects, every investment plan owes a debt to some type of backtesting—even for a buy-and-hold strategy, which assumes that the future will deliver gains on par with what was earned in the past. The proper lesson is that designing robust backtests, which demands close attention to detail. Easier said than done, of course, in part because the pitfalls can be subtle. Here are three that routinely trip up the novice and perhaps even some experienced investors:

1) the use of total-return prices for technical signals
2) failing to correct for look-ahead bias by not using lagged signals
3) overlooking the importance of neutral signals for computing backtest results

The good news is that these traps are easily avoided. But there’s a catch: you have to be aware of the hazards. With that in mind, let’s briefly review these backtesting snares with some simple examples.

Total return data. Imagine that you’ve created what you think of as a winning investment strategy that’s based on two signals: a) the ratio for a set of short and long moving averages; b) the trailing return for a rolling x-day window. The results look encouraging, but the upbeat outcome may be an illusion if the calculations use total return prices.

Why? Consider a mutual fund that’s unchanged on the day but dispenses a hefty distribution at the close of trading. Imagine that this fund is priced at $10 a share and it spits out a 50-cent-per-share payout. Although the underlying portfolio value was unchanged on the day the mutual fund’s price falls by 50 cents to $9.50 to compensate for the distribution. The net result for shareholders: their holdings in the fund remain unchanged on the day. The 50-cent-per-share drop is offset by a 50-cent distribution. In short, a net wash.

It’s a routine affair in day-to-day market activity but it’s a trap if you’re looking at a fund’s technical profile without adjusting for distributions. Let’s say that the 50-cent price decline pushes the fund into negative territory in terms of the short/long moving-average ratio and trailing x-day return. On the surface, this looks like a sell signal when in fact it’s nothing of the sort since the fund’s portfolio value hasn’t changed.

The solution is to use price data that’s strips out distributions. If you don’t make that adjustment, your backtests using technical signals are probably faulty. Keep in mind too that the total return price histories aren’t real in the sense that the prices have been retroactively adjusted down to compensate for dividends, capital gains, etc. In other words, total return prices weren’t available in real time through history. Ignoring this issue runs the risk that your backtests are telling lies.

Lagged signals & avoid look-ahead bias. This is another common mistake that can turn a sow’s ear into pearls, if only on paper. There are many variations to this trap, depending on the complexity of the strategy, but the basic form can be illustrated with a simple example.

Take a strategy that issues a “sell” signal when price falls below an x-day moving average and a “buy” when price rises above that average. Let’s also assume that we’re using end-of-day closing prices. You test the strategy and discover that it delivers a strong performance through time. But you forget one small item: the end-of-day signals aren’t available until after the market closes. In other words, calculating returns for a real-world version of the strategy requires using lagged “buy” and “sell” signals.

One solution: assume a one-day lag. A “sell” signal is issued at Monday’s close, which translates to assuming that security was sold at the following’s day’s close.

How much difference will such a seemingly minor change make in a strategy’s results? A lot. Indeed, many strategies that look wonderful in backtests turn into dogs after correcting for look-ahead bias.

Neutral signals. This is an especially subtle problem because it’s counterintuitive in some respects.

The problem is when there’s a gray area with one or more trading signals. For instance, let’s say you’re using two signals to determine if the current climate for an asset is bullish or bearish. A “buy” is when both signals are bullish; a “sell” is when both are bearish. If there’s a split decision—one is bullish, the other bearish—the signal is neutral, which is to say that the previous signal holds until both signals indicate a decisive change, one way or the other.

As an example, both signals issued a “buy” signal the first trading day of the month. Two weeks later one of the signals turns bearish but there’s no confirmation in the other signal, which continues to align with a bullish reading. The net result: we no longer have a “buy” signal, but there’s no “sell” signal either. In that case, the previous signal—a “buy”—remains in force until a “sell” signal arrives.

Obvious? Well, sure, once we spell it out and are aware of the subtlety. But designing this nuance into the code can trip up a rookie. The solution: generate a historical record of “buy” and “sell” signals and monitor the net result via a “position” signal. A standard system is to generate a “1” for “buy”, “0” for netural, and “-1” for “sell” in the “position” data. By contrast, a common mistake is to only calculate the “buy” signals and assume that the absence of a “buy” is the equivalent of “sell”. Not necessarily, but that won’t be obvious unless you compute a separate set of “sell” and “neutral” signals.

What’s the relevance? Results. A backtest that equates “neutral” with “buy” signals can and usually does dispense substantially different results vs. a test that recognizes the distinction. Ok, maybe you want to blur the lines for tactical reasons. That’s fine. The danger arises when the analyst doesn’t spot the difference in advance.

These are hardly the only pitfalls in backtesting, but they’re relatively common—and easily avoided. The question is whether these quantitative stumbles have skewed results in some of the more influential backtests that have found a wide audience in recent years? The answer: unclear until (if) we can reproduce the research. Unfortunately, most of the backtests that make the rounds these days don’t provide the accompanying code. That’s one more reason why it’s essential to crunch the numbers directly before making substantial monetary commitments to a given strategy.

As President Reagan famously advised, Trust but Verify. That’s a good policy for geopolitical negotiations and for backtesting investment strategies.

8 thoughts on “3 Common Backtesting Traps With Easy Solutions

  1. John

    I am a bit confused by the wording of the total return data trap. It appears using total return data in constructing a model is common sense. One prime example would be for time series momentum. It is correct that you are saying that to do a proper backtest you must use total return data?

  2. James Picerno Post author

    John,
    In some instances total return data is fine and even preferred. If, for instance, you’re modeling the historical relationship between several asset classes and looking for insights into how each performed it’s wise to use total returns. Indeed, it’s clear that dividends make up a significant portion of equity performance through time and so ignoring that effect would be misleading. That said, if you’re testing a trading strategy that recreates real-time signals in history, total return prices can be misleading, for the reasons noted. The bottom line: it’s important to use the right time series for a given research project. Sometimes that means using total return prices, but not always.
    –JP

  3. Ilya Kipnis

    I’d argue that even for real time signals, you can use total returns. Prices = cumulative product of total returns up to that point, at which point, feel free to use your SMAs and RSIs to your heart’s content.

    Regarding look-ahead bias, what you describe isn’t look-ahead bias (E.G. a sell order on Monday’s close executed on Tuesday’s close), but instead, magical thinking of “see the close, execute at the close”, which can be approximated by “execute 5 minutes before close for sufficiently small sizes”. Another approximation would be to use the next day’s open, which SHOULDN’T devastate a strategy too hard. If it does, it means your strategy depends on that magical thinking execution and has only the capacity you can execute in those last few minutes between observation and actual close. Lookahead bias is necessarily a mistake in implementation. EG computing your returns with (price/lag(price)-1), then forgetting to lag your signal by 1, thus calculating your returns as starting on the close BEFORE you got your signal. THAT is lookahead bias. The other one is magical thinking.

    As for neutral signals, there’s a very easy solution, for those who ever bothered to look at how quantstrat’s mechanics actually work. Initialize your signal to a bunch of NAs, then change only the days of buy/flat/short signals to 1, 0, and -1, respectively. Then, just na.locf and you’re done.

  4. Pingback: Quantocracy's Daily Wrap for 11/04/2015 | Quantocracy

  5. James Picerno Post author

    Good points, Ilya. I agree that one can use total returns for, say, moving averages and other technical analytics in real time for insight. The trouble arises when looking into history with total return data. Recognizing the distinction is crucial. Thanks for clarifying. As for look-ahead bias vs. magical thinking, interesting point re: the nomenclature. Perhaps it’s time to update my lexicon. Appreciate the insight.
    –JP

  6. Govind

    We receive total returns and not just price appreciation, so total returns seem more logical to me. As you point out in your comments, dividends can make up a significant part of one’s return. For fixed income instruments, it can make up most of the return. If there is a distribution and an adjustment is made by adding back the distribution to offset the drop in asset price, then that should be all that is needed as long as there are no lags in doing this.

    As for lagged signals, I’ve never had a problem anticipating a little before the close if a signal will be given using real time price charts. I can then execute both sells and buys almost simultaneously.

  7. Pingback: 11/09/15 – Monday’s Interest-ing Reads | Compound Interest-ing!

  8. Pingback: 3 Common Backtesting Traps With Easy Solutions  | SAMUELSSONS RAPPORT

Comments are closed.