Backtesting in the Age of Financial Machine Learning

October 2nd, 2021 9 minutes read

As a smart trader, you know the importance of backtesting your trading signals to develop or refine your trading system. The idea is that your trading strategies that may have worked well in the past may stop doing so in the future. If you are smart but less experienced with what machine learning may have brought to the market, you may not paid attention to how backtesting must change to truly contribute to alpha creation and trading in general. There are two related issues here: (a) machine learning has the power to democratize access to data relationships but on the other hand, (b) financial data has very low signal to noise ratio.

In this article, I talk about the future of backtesting. You will learn how you should approach backtesting and what you should avoid especially in this era. For starters, you must:

  1. Adopt a scientific approach to backtesting historical and ex ante results
  2. Develop and maintain an exact protocol for testing, considering data limitations and common potential traps

Doing so makes sure that you avoid data overfitting and snooping, as well as seriously adopt cross-validation and economic reasoning.

Ready to learn how to backtest signals? Let’s get started. But first, a few definitions, which will help you as we learn about backtesting using machine learning:

Definition of Terms

  • Data overfitting refers to analyses that correspond too closely to a particular set of data, but will not perform well on other sets.
  • Data snooping (aka curve fitting) refers to a situation when a researcher decides to perform statistical inference after looking at data, versus a pre-planned inference. It means exhaustively searching for combinations of variables and likely leads to bias.
  • Cross-validation (aka out-of-sample testing) is the use of any number of statistical techniques for assessing how statistical results generalize to an independent data set. 
  • Economic reasoning assumes that when people face tradeoffs, they respond to incentives. They think and decide at the marginal changes of the situations.
  • Nowcasting is the prediction of the near future or the very recent past state.

Information Explosion: Zettabytes of Data

Did you know that all the world’s digital data is believed to reach 40 zettabytes (40 trillion gigabytes) by the end of 2020? For the human mind, this staggering amount of information is hard to comprehend. 

To put this in perspective, this means that an average person creates 1.7MB of data every second (and going up). And if you place this on a graph, it’ll show the amount of digital data has been doubling every two years for the past 10 years.

In fact, by 2025, it is predicted that we will have 175ZB of data.

Digital Data: Gold Mine or Garbage Dump?

While it’s not wrong to think that this digital age has brought us a goldmine of information, the reality is most of this data is noise. Traders looking for signals or clean filtered data that support hypotheses face a critical limitation: time.

Furthermore, there is a limited window to act on information when trading assets, and you’ll need to decide immediately.  Having a lot of noise present makes it difficult to see things clearly. 

The Power Of Noise And Backtesting

Surely, you cannot expect to easily find trading signals (i.e., signals that yield potential expected gains or losses and/or improvement in risk management practices). In fact, if we look at the naive economic theory, it argues that the search may be futile. 

On the other hand, rational economic theory says that signals may exist in the presence of noise. However, the collective cost of their discovery is limited to the benefits you reap when you trade. Otherwise, other traders will enter your space and the situation will no longer occur. 

Understanding rational economic theory is essential to understanding the limitations of signals and the significance of proper backtesting.

Not All Signals Are Created Equal

For decades, economists have said that observing price behavior could tell the direction of individual and collective market behavior. They believed the rise and fall of prices could be seen as indicators of market sentiment--including objective and subjective information. While suchsuggestions have never found solid theoretical or continuous statistical backing, observing individuals' direct interests may yield more plausible results. 

What’s great about having zettabytes of the world’s data at our disposal is we can mine signals from online interactions. These seem to offer a new perspective on market participants' behavior in periods of large market movements. But what exactly should we test?

Why Nowcasting Instead of Forecasting

From a signal standpoint and given all the traps, traders must focus on nowcasting instead of the usual forecasting models. Here are some of the advantages of nowcasting

  • Direct measurements that always hold true because they do not rely on a statistical lead-lag relationship
  • Short-range predictions are statistically more reliable than long-range ones, which also implies that most published discoveries or signals in finance are false after a while

Unfortunately a common but shaky practice, for some academics and many practitioners, is to run tens of thousands of historical backtests to identify a promising investment strategy. The best cherry-picked test is then reported as if a single trial had taken place. This then becomes the basis for publication, or for launching a new fund.

Now that we know this… how exactly should we test signals?

The Backstage of Backtesting

While academic and industry research have shown the power of backtesting, not all backtesting protocols are created equal. However, we can improve reasonable strategies when we apply methodical protocols. To find the best strategies (e.g., the ones that compare active trading strategies for signal validation), we must find the ones that statistically support the signal. Otherwise the signal is weak and/or the strategy requires significant adjustment.

For instance, if we compare two strategies (an active vs. a “buy & hold”) with one test (e.g., Sharpe ratio), we might ask: “Does the active strategy produce higher Sharpe ratios under different scenarios?”

Backtesting 101

When backtesting trading signals, you want to compare the results of different strategies, mainly buy & hold versus different proactive management strategies. 

Our Basic Assumptions

Many times, for simplicity and to avoid making portfolio assumptions, we assume the trader adopts the position of (a) keep, (b) reduce, or (3) increase using a simple fixed percent of the existing dollar-value exposure.

Designing a Testing Template

Besides (1) the application of simple statistical tests and (2) the recognition of limits and traps to the protocols (such as data overfitting, lack of economic realism, or statistical insignificance), we also need to design (3) a testing template (or the framework and tenets) of the field. After all, backtesting a financial model is not like backtesting for a physics lab test.

Avoiding Fundamental Traps

Before anything else, we need to describe the basic tenets or framework for approaching the problem and deciding on the backtesting implementation direction.

  1. We will identify at least two statistical sets, then run the result on one and compare it to the other.
  2. We will rely on rational observation: if the two sets are very similar, the results are likely to be very similar. When this happens, the backtesting is not informative.

From this, we now know that we should differentiate the data sets. As we focus on market realities (thanks to the no-arbitrage principle), we should also note that there are no parallel universes in financial markets. From here, we can conclude that financial market data is limited.

Because of this, we need to do a backfilling time series as some assets do not have clear prices due to a lack of trading or price transparency.

5 Common Traps When Backtesting

The most important trap you’ll find when backtesting has to do with economic reasoning. Here are some of the other ones that I’ve found through the years:

Trap No. 1: Using the same historical data or finding combinations that yield statistically significant results (i.e., creating false positives or data snooping)

Trap No. 2: Ignoring rational economic theory--or expecting the same results to hold in the future even if not all conditions are the same. While Keynes famously said the market could be irrational longer than a trader’s solvency, you are more likely to win a bet against an irrational market than a rational one.

Trap No. 3: Knowing that the larger the dataset, the more behaviors you can expect, but ignoring research that indicates that Sharpe ratios quickly deteriorate when applying strategies backed by the same historical data.

Trap No. 4: When more and more traders are equipped with the same signals, their trades become very crowded and their signals less potent.

Trap No. 5: Not having guardrails in place to protect you and your trading system. Traders must now keep backtesting and adjusting old signals as often as possible.

Can This Age Of Big Data Predict the Future?

The answer is no. With machine learning, it's tempting to think that traders can perform miracles and predict the future with all these zettabytes of data and their computing resources. Unfortunately, while you can certainly learn a lot through signals, there is a limit to what you can gain from the data you process. More data does not always mean better forecasts. 

In addition, there are no accurate out-of-sample data sets in the financial world. In fact, the only true out-of-sample is live trading. This is why you must backtest dynamically and continuously. Combine this with cross-validation, and you keep track of various combinations of variables.

Now, to cross-validate and backtest continuously means that the data you analyze now grows exponentially. Even if you have a machine with immense processing power that can perform fast calculations, you have to wonder how meaningful these signals could be. Think of it this way--5 variables plus 2 additional ones yields 30 possible interactions (=5!/(2!2!)).

Are We Ignoring Economic Reasoning?

Economics, unlike many other fields (e.g., physics, medicine), don't have the luxury of being able to carry out extensive out-of-sample tests. When you do, it’s critical to consider individual behaviors (humanity's preferences, needs, and attitudes) that change over time. You can develop tests covering different economic regimes and see the direct impact of changes in individual and collective behaviors and their impact on models and data sets.

Consider Backtesting Models with Ex Ante Economic Hypotheses

For example, if a war in the Middle East breaks and oil prices jump, you should look for interpretability in the results. Remember that if you only back your results with historical data, it may not pass a contemporary socially-grounded reading. Note that “backward analysis” or using an ex post economic foundation only forms economic stories. These stories are often flimsy.

To make it stronger, you must create a test that establishes the economic foundation. Interestingly, most academics and financial analysts usually search for investment strategies that would perform well across all market regimes.

The likelihood that genuine “all-regime” strategies exist is slim because markets are adaptive, and investors learn from mistakes. Even if all-regime strategies existed, they are likely to be a relatively insignificant subset of the population of strategies that work across one or more regimes.

Regime-specific investment strategies are common among market makers. It allows them to adapt to new market conditions quickly. Thanks to nowcasting, funds can apply the same approach to strategy deployment.

Summing Up and Discounting

There are essential points to be considered when you create a template for backtesting. These include the following:

  • Looking for ways to disprove signals in different regimes, i.e., search for evidence of inconsistencies in ex ante tests
  • Avoiding data to be massaged (capturing true or at least rule-based outliers
  • Recognizing that when data is limited, economic foundations become even more important and vice-versa

A Final Note

With regard to applying different regimes validation, you should be aware of correlations across markets and countries. Also, we did not tackle transaction costs, a critical issue for trading based on signals. Transaction costs and different regimes have two important implications and are connected to economic considerations. Different regimes justify different (bid-ask and/or commission) spreads, and why specialized traders tend to be the ones with significant positive differentials.


To learn more about how BAM can help you with assessing financial risk using our no-code proprietary machine learning tools, visit We help risk and investment professionals, such as yourself, filter through the noise through concise and actionable signals. Our goal is to help you make better investment decisions. 


Geraldo Filgueiras, Founder and CEO of (, @gerafilg)




Related Blogs
Fresh and Unique Alpha
Identification and validation of Alpha is the most important element that distinguishes portfolio managers. When seeking for Alpha, one should be aware to see the difference between market expectations and their own expectations of Alpha. Thus, it is important to implement processes that will continuously challenge and validate a professional's strategies and hypothesis.

August 2nd, 2021

Generating Alpha: Machine Learning Helps Traders
Investors are adopting machine learning as a strategy to identify alpha and gain market advantage through sentiment analysis, alternative data insights and maximizing gains. However, the implementation of a professional’s analytical skills and moral values are important in the decision-making processes.

September 2nd, 2021