This chapter integrates the various building blocks of the ML4T workflow so far discussed separately to present the end-to-end process of designing, simulating and evaluating a trading strategy driven by a machine learning (ML) algorithm. Most importantly, we will demonstrate in more detail how to backtest an ML-driven strategy in a historical market context using the Python libraries Backtrader
and Zipline
.
Now it’s time to integrate the various building blocks of the machine learning for trading (ML4T) workflow that we have so far discussed separately. The goal of this chapter is to present an end-to-end perspective on the process of designing, simulating, and evaluating a trading strategy driven by an ML algorithm. To this end, we will demonstrate in more detail how to backtest an ML-driven strategy in a historical market context using the Python libraries backtrader and Zipline.
The ultimate objective of the ML4T workflow is to gather evidence from historical data that helps decide whether to deploy a candidate strategy in a live market and put financial resources at risk. This process builds on the skills you developed in the previous chapters because it relies on your ability
- to work with a diverse set of data sources to engineer informative factors,
- to design ML models that generate predictive signals to inform your trading strategy, and
- to optimize the resulting portfolio from a risk-return perspective.
A realistic simulation of your strategy also needs to faithfully represent how security markets operate and how trades are executed. Therefore, the institutional details of exchanges, such as which order types are available and how prices are determined, also matter when you design a backtest or evaluate whether a backtesting engine includes the requisite features for accurate performance measurements. Finally, there are several methodological aspects that require attention to avoid biased results and false discoveries that will lead to poor investment decisions.
More specifically, after working through this chapter you will be able to:
- Plan and implement end-to-end strategy backtesting
- Understand and avoid critical pitfalls when implementing backtests
- Discuss the advantages and disadvantages of vectorized vs event-driven backtesting engines
- Identify and evaluate the key components of an event-driven backtester
- Design and execute the ML4T workflow using data sources at minute and daily frequencies, with ML models trained separately or as part of the backtest
- Know how to use Zipline and backtrader
The data used for some of the backtest simulations are generated by the script data_prep.py in the data directory.
Backtesting simulates an algorithmic strategy using historical data with the goal of identifying patterns that generalize to new market conditions. In addition to the generic challenges of predicting an uncertain future in changing markets, numerous factors make mistaking positive in-sample performance for the discovery of true patterns very likely.
These factors include aspects of the data, the implementation of the strategy simulation, and flaws with the statistical tests and their interpretation. The risks of false discoveries multiply with the use of more computing power, bigger datasets, and more complex algorithms that facilitate the identification of apparent patterns in the noise.
The most prominent challenge to backtest validity, including to published results, relates to the discovery of spurious patterns due to multiple testing during the strategy-selection process. Selecting a strategy after testing different candidates on the same data will likely bias the choice because a positive outcome is more likely to be due to the stochastic nature of the performance measure itself. In other words, the strategy is overly tailored, or overfit, to the data at hand and produces deceptively positive results.
Marcos Lopez de Prado has published extensively on the risks of backtesting, and how to detect or avoid it. This includes an online simulator of backtest-overfitting.
De Lopez Prado and Bailey (2014) derive a deflated SR to compute the probability that the SR is statistically significant while controlling for the inflationary effect of multiple testing, non-normal returns, and shorter sample lengths.
The pyton script deflated_sharpe_ratio in the directory multiple_testing contains the Python implementation with references for the derivation of the related formulas.
-
The Deflated Sharpe Ratio: Correcting for Selection Bias, Backtest Overfitting and Non-Normality, Bailey, David and Lopez de Prado, Marcos, Journal of Portfolio Management, 2013
-
Backtesting, Lopez de Prado, Marcos, 2015
-
Optimal Stopping and Applications, Ferguson, Math Department, UCLA
-
Advances in Machine Learning Lectures 4/10 - Backtesting I, Marcos Lopez de Prado, 2018
-
Advances in Machine Learning Lectures 5/10 - Backtesting II, Marcos Lopez de Prado, 2018
-
The code examples for this section are in the subfolder multiple_testing.
- The code examples for this section are in the notebook vectorized_backtest.
- The code examples for this section are in the notebook backtesting_with_backtrader.
- Backtrader website
The open source Zipline library is an event-driven backtesting system maintained and used in production by the crowd-sourced quantitative investment fund Quantopian to facilitate algorithm-development and live-trading. It automates the algorithm's reaction to trade events and provides it with current and historical point-in-time data that avoids look-ahead bias.
In Chapter 4, we introduced zipline
to simulate the computation of alpha factors, and in Chapter 5 we added trades to simulate a simple strategy and measure its performance as well as optimize portfolio holdings using different techniques.
The code for this section is in the subdirectory ml4t_workflow_with_zipline:
- the notebook backtesting_with_zipline demonstrates the use of the
Pipeline
interface while loading ML predictions from another local (HDF5) data source - the notebook ml4t_with_zipline shows how to train an ML model locally as part of a
Pipeline
using aCustomFactor
and various technical indicators as features for dailybundle
data. - the notebook ml4t_quantopian shows how to train an ML model on the Quantopian platform to utilize the broad range of data sources available there.
- The current release 1.3 has a few shortcomings such as the dependency on benchmark data from the IEX exchange and limitations for importing features beyond the basic OHLCV data points.
- To enable the use of
zipline
, I've provided a patched version that works for the purposes of this book.- Create a virtual environment based on Python 3.5, for instance using pyenv
- After activating the virtual environment, run
pip install -U pip Cython
- Install the patched
zipline
version by cloning the repo,cd
into the packages' root folder and runpip install -e
- Run
pip install jupyter pyfolio