Digital Data: Gold Mine or Garbage Dump?
November 30th, 2021 3 minutes read
If you’d like to learn more about how to navigate digital data, read our previously published article “Backtesting in the Age of Financial Machine Learning.”
Did you know that all the world’s digital data is believed to reach 40 zettabytes (40 trillion gigabytes) by the end of 2020? For the human mind, this staggering amount of information is hard to comprehend.
To put this in perspective, this means that an average person creates 1.7MB of data every second (and going up). And if you place this on a graph, it will show the amount of digital data has been doubling every two years for the past 10 years.
In fact, by 2025, it is predicted that we will have 175ZB of data. But, is all this data information? While it’s not wrong to think that this digital age has brought us a goldmine of information, the reality is most of this data is noise. Traders looking for signals or clean filtered data that support hypotheses face a critical limitation: time.
Time is of critical importance when trading assets, as traders need to decide immediately. Having a lot of noise present makes it difficult to see things clearly. While one cannot expect to find trading signals easily, naive economic theory argues that under such conditions, the search may be futile.
On the other hand, rational economic theory says that signals may exist in the presence of noise. However, the collective cost of their discovery is limited to the benefits gathered when trading. Otherwise, other traders will enter your space and the situation will no longer occur.
Understanding rational economic theory is essential to understanding the limitations of signals and the significance of proper backtesting1.
Where should you look for signals?
For decades, economists have said that observing price behavior could tell the direction of individual and collective market behavior. Historically, they believed the rise and fall of prices could be seen as indicators of market sentiment--including objective and subjective information. While such suggestions have never found solid theoretical or continuous statistical backing, observing individuals' direct interests may yield more plausible results.
What’s great about having zettabytes of the world’s data at our disposal is we can mine signals from online interactions. These seem to offer a new perspective on market participants' behavior in periods of large market movements. But what exactly should we test?
Why Nowcasting Instead of Forecasting
From a signal standpoint and given all the traps, traders must focus on nowcasting instead of the usual forecasting models. Through nowcasting, traders can get:
- Direct measurements that always hold true because they do not rely on a statistical lead-lag relationship
- Short-range predictions are statistically more reliable than long-range ones, which also implies that most published discoveries or signals in finance are false after a while
Unfortunately a common but shaky practice, for some academics and many practitioners, is to run tens of thousands of historical backtests to identify a promising investment strategy. The best cherry-picked test is then reported as if a single trial had taken place. This then becomes the basis for publication, or for launching a new fund.