投资组合配置¶
我们的论文:FinRL:一个用于量化金融中自动化股票交易的深度强化学习库。
在 NeurIPS 2020:深度强化学习研讨会上展示。
Jupyter notebook 代码可在我们的 Github 和 Google Colab 上获取。
查看我们之前的教程:单只股票交易 和 多只股票交易,以详细了解 FinRL 的架构和模块。
概述¶
首先,我们想解释使用深度强化学习进行投资组合配置的逻辑。我们在本文中始终使用道琼斯30指数成分股作为示例,因为它们是最受欢迎的股票。
假设我们在2019年初拥有100万美元。我们希望将这1,000,000美元投资于股票市场,在本例中是道琼斯30指数成分股。假设没有保证金、没有卖空、没有国库券(将所有资金仅用于交易这30只股票)。这样每只个股的权重都是非负的,并且所有股票的权重加起来等于1。
我们聘请了一位智能的投资组合经理——深度强化学习先生。DRL先生将每天给我们建议,包括投资组合权重或投资这30只股票的资金比例。因此,每天我们只需重新平衡股票的投资组合权重。基本逻辑如下。

投资组合配置与多只股票交易不同,因为我们在每个时间步都在本质上重新平衡权重,并且我们必须使用所有可用的资金。
进行投资组合配置的传统和最流行的方法是均值-方差或现代投资组合理论(MPT)

然而,MPT 在样本外数据中表现不佳。MPT 仅基于股票回报计算,如果我们要考虑其他相关因素,例如一些技术指标如 MACD 或 RSI,MPT 可能无法很好地将这些信息结合在一起。
我们引入了一个 DRL 库 FinRL,它方便初学者接触量化金融。FinRL 是一个专门为自动化股票交易设计的 DRL 库,旨在用于教育和演示目的。
本文重点介绍我们论文中的一个用例:投资组合配置。我们使用一个 Jupyter notebook 包含了所有必要的步骤。
问题定义¶
本问题是设计一个用于投资组合配置的自动化交易解决方案。我们将股票交易过程建模为一个马尔可夫决策过程(MDP)。然后我们将交易目标表述为一个最大化问题。
强化学习环境的组成部分是:
动作:每只股票的投资组合权重在 [0,1] 范围内。我们使用 softmax 函数将动作归一化,使其总和为1。
状态: {协方差矩阵, MACD, RSI, CCI, ADX},状态空间形状为 (34, 30)。34 是行数,30 是列数。
奖励函数:r(s, a, s′) = p_t,p_t 是累积投资组合价值。
环境:道琼斯30指数成分股的投资组合配置。
协方差矩阵是一个很好的特征,因为投资组合经理用它来量化与特定投资组合相关的风险(标准差)。
我们还假设没有交易成本,因为我们试图将一个简单的投资组合配置案例作为一个起点。
加载 Python 包¶
安装 FinRL 的不稳定开发版本
1 # Install the unstable development version in Jupyter notebook:
2 !pip install git+https://github.com/AI4Finance-LLC/FinRL-Library.git
导入包
1 # import packages
2 import pandas as pd
3 import numpy as np
4 import matplotlib
5 import matplotlib.pyplot as plt
6 matplotlib.use('Agg')
7 import datetime
8
9 from finrl import config
10 from finrl import config_tickers
11 from finrl.marketdata.yahoodownloader import YahooDownloader
12 from finrl.preprocessing.preprocessors import FeatureEngineer
13 from finrl.preprocessing.data import data_split
14 from finrl.env.environment import EnvSetup
15 from finrl.env.EnvMultipleStock_train import StockEnvTrain
16 from finrl.env.EnvMultipleStock_trade import StockEnvTrade
17 from finrl.model.models import DRLAgent
18 from finrl.trade.backtest import BackTestStats, BaselineStats, BackTestPlot, backtest_strat, baseline_strat
19 from finrl.trade.backtest import backtest_strat, baseline_strat
20
21 import os
22 if not os.path.exists("./" + config.DATA_SAVE_DIR):
23 os.makedirs("./" + config.DATA_SAVE_DIR)
24 if not os.path.exists("./" + config.TRAINED_MODEL_DIR):
25 os.makedirs("./" + config.TRAINED_MODEL_DIR)
26 if not os.path.exists("./" + config.TENSORBOARD_LOG_DIR):
27 os.makedirs("./" + config.TENSORBOARD_LOG_DIR)
28 if not os.path.exists("./" + config.RESULTS_DIR):
29 os.makedirs("./" + config.RESULTS_DIR)
下载数据¶
FinRL 使用 YahooDownloader 类来提取数据。
class YahooDownloader:
"""
Provides methods for retrieving daily stock data from Yahoo Finance API
Attributes
----------
start_date : str
start date of the data (modified from config.py)
end_date : str
end date of the data (modified from config.py)
ticker_list : list
a list of stock tickers (modified from config.py)
Methods
-------
fetch_data()
Fetches data from yahoo API
"""
下载数据并保存到 pandas DataFrame 中
1 # Download and save the data in a pandas DataFrame:
2 df = YahooDownloader(start_date = '2008-01-01',
3 end_date = '2020-12-01',
4 ticker_list = config_tickers.DOW_30_TICKER).fetch_data()
数据预处理¶
FinRL 使用 FeatureEngineer 类来预处理数据。
class FeatureEngineer:
"""
Provides methods for preprocessing the stock price data
Attributes
----------
df: DataFrame
data downloaded from Yahoo API
feature_number : int
number of features we used
use_technical_indicator : boolean
we technical indicator or not
use_turbulence : boolean
use turbulence index or not
Methods
-------
preprocess_data()
main method to do the feature engineering
"""
执行特征工程:协方差矩阵 + 技术指标
1 # Perform Feature Engineering:
2 df = FeatureEngineer(df.copy(),
3 use_technical_indicator=True,
4 use_turbulence=False).preprocess_data()
5
6
7 # add covariance matrix as states
8 df=df.sort_values(['date','tic'],ignore_index=True)
9 df.index = df.date.factorize()[0]
10
11 cov_list = []
12 # look back is one year
13 lookback=252
14 for i in range(lookback,len(df.index.unique())):
15 data_lookback = df.loc[i-lookback:i,:]
16 price_lookback=data_lookback.pivot_table(index = 'date',columns = 'tic', values = 'close')
17 return_lookback = price_lookback.pct_change().dropna()
18 covs = return_lookback.cov().values
19 cov_list.append(covs)
20
21 df_cov = pd.DataFrame({'date':df.date.unique()[lookback:],'cov_list':cov_list})
22 df = df.merge(df_cov, on='date')
23 df = df.sort_values(['date','tic']).reset_index(drop=True)
24 df.head()

构建环境¶
FinRL 使用 EnvSetup 类来设置环境。
class EnvSetup:
"""
Provides methods for retrieving daily stock data from
Yahoo Finance API
Attributes
----------
stock_dim: int
number of unique stocks
hmax : int
maximum number of shares to trade
initial_amount: int
start money
transaction_cost_pct : float
transaction cost percentage per trade
reward_scaling: float
scaling factor for reward, good for training
tech_indicator_list: list
a list of technical indicator names (modified from config.py)
Methods
-------
create_env_training()
create env class for training
create_env_validation()
create env class for validation
create_env_trading()
create env class for trading
"""
初始化一个环境类
用户定义环境:一个模拟环境类。用于投资组合配置的环境
1 import numpy as np
2 import pandas as pd
3 from gym.utils import seeding
4 import gym
5 from gym import spaces
6 import matplotlib
7 matplotlib.use('Agg')
8 import matplotlib.pyplot as plt
9
10 class StockPortfolioEnv(gym.Env):
11 """A single stock trading environment for OpenAI gym
12 Attributes
13 ----------
14 df: DataFrame
15 input data
16 stock_dim : int
17 number of unique stocks
18 hmax : int
19 maximum number of shares to trade
20 initial_amount : int
21 start money
22 transaction_cost_pct: float
23 transaction cost percentage per trade
24 reward_scaling: float
25 scaling factor for reward, good for training
26 state_space: int
27 the dimension of input features
28 action_space: int
29 equals stock dimension
30 tech_indicator_list: list
31 a list of technical indicator names
32 turbulence_threshold: int
33 a threshold to control risk aversion
34 day: int
35 an increment number to control date
36 Methods
37 -------
38 _sell_stock()
39 perform sell action based on the sign of the action
40 _buy_stock()
41 perform buy action based on the sign of the action
42 step()
43 at each step the agent will return actions, then
44 we will calculate the reward, and return the next observation.
45 reset()
46 reset the environment
47 render()
48 use render to return other functions
49 save_asset_memory()
50 return account value at each time step
51 save_action_memory()
52 return actions/positions at each time step
53
54 """
55 metadata = {'render.modes': ['human']}
56
57 def __init__(self,
58 df,
59 stock_dim,
60 hmax,
61 initial_amount,
62 transaction_cost_pct,
63 reward_scaling,
64 state_space,
65 action_space,
66 tech_indicator_list,
67 turbulence_threshold,
68 lookback=252,
69 day = 0):
70 #super(StockEnv, self).__init__()
71 #money = 10 , scope = 1
72 self.day = day
73 self.lookback=lookback
74 self.df = df
75 self.stock_dim = stock_dim
76 self.hmax = hmax
77 self.initial_amount = initial_amount
78 self.transaction_cost_pct =transaction_cost_pct
79 self.reward_scaling = reward_scaling
80 self.state_space = state_space
81 self.action_space = action_space
82 self.tech_indicator_list = tech_indicator_list
83
84 # action_space normalization and shape is self.stock_dim
85 self.action_space = spaces.Box(low = 0, high = 1,shape = (self.action_space,))
86 # Shape = (34, 30)
87 # covariance matrix + technical indicators
88 self.observation_space = spaces.Box(low=0,
89 high=np.inf,
90 shape = (self.state_space+len(self.tech_indicator_list),
91 self.state_space))
92
93 # load data from a pandas dataframe
94 self.data = self.df.loc[self.day,:]
95 self.covs = self.data['cov_list'].values[0]
96 self.state = np.append(np.array(self.covs),
97 [self.data[tech].values.tolist() for tech in self.tech_indicator_list ], axis=0)
98 self.terminal = False
99 self.turbulence_threshold = turbulence_threshold
100 # initalize state: inital portfolio return + individual stock return + individual weights
101 self.portfolio_value = self.initial_amount
102
103 # memorize portfolio value each step
104 self.asset_memory = [self.initial_amount]
105 # memorize portfolio return each step
106 self.portfolio_return_memory = [0]
107 self.actions_memory=[[1/self.stock_dim]*self.stock_dim]
108 self.date_memory=[self.data.date.unique()[0]]
109
110
111 def step(self, actions):
112 # print(self.day)
113 self.terminal = self.day >= len(self.df.index.unique())-1
114 # print(actions)
115
116 if self.terminal:
117 df = pd.DataFrame(self.portfolio_return_memory)
118 df.columns = ['daily_return']
119 plt.plot(df.daily_return.cumsum(),'r')
120 plt.savefig('results/cumulative_reward.png')
121 plt.close()
122
123 plt.plot(self.portfolio_return_memory,'r')
124 plt.savefig('results/rewards.png')
125 plt.close()
126
127 print("=================================")
128 print("begin_total_asset:{}".format(self.asset_memory[0]))
129 print("end_total_asset:{}".format(self.portfolio_value))
130
131 df_daily_return = pd.DataFrame(self.portfolio_return_memory)
132 df_daily_return.columns = ['daily_return']
133 if df_daily_return['daily_return'].std() !=0:
134 sharpe = (252**0.5)*df_daily_return['daily_return'].mean()/ \
135 df_daily_return['daily_return'].std()
136 print("Sharpe: ",sharpe)
137 print("=================================")
138
139 return self.state, self.reward, self.terminal,{}
140
141 else:
142 #print(actions)
143 # actions are the portfolio weight
144 # normalize to sum of 1
145 norm_actions = (np.array(actions) - np.array(actions).min()) / (np.array(actions) - np.array(actions).min()).sum()
146 weights = norm_actions
147 #print(weights)
148 self.actions_memory.append(weights)
149 last_day_memory = self.data
150
151 #load next state
152 self.day += 1
153 self.data = self.df.loc[self.day,:]
154 self.covs = self.data['cov_list'].values[0]
155 self.state = np.append(np.array(self.covs), [self.data[tech].values.tolist() for tech in self.tech_indicator_list ], axis=0)
156 # calcualte portfolio return
157 # individual stocks' return * weight
158 portfolio_return = sum(((self.data.close.values / last_day_memory.close.values)-1)*weights)
159 # update portfolio value
160 new_portfolio_value = self.portfolio_value*(1+portfolio_return)
161 self.portfolio_value = new_portfolio_value
162
163 # save into memory
164 self.portfolio_return_memory.append(portfolio_return)
165 self.date_memory.append(self.data.date.unique()[0])
166 self.asset_memory.append(new_portfolio_value)
167
168 # the reward is the new portfolio value or end portfolo value
169 self.reward = new_portfolio_value
170 #self.reward = self.reward*self.reward_scaling
171
172
173 return self.state, self.reward, self.terminal, {}
174
175 def reset(self):
176 self.asset_memory = [self.initial_amount]
177 self.day = 0
178 self.data = self.df.loc[self.day,:]
179 # load states
180 self.covs = self.data['cov_list'].values[0]
181 self.state = np.append(np.array(self.covs), [self.data[tech].values.tolist() for tech in self.tech_indicator_list ], axis=0)
182 self.portfolio_value = self.initial_amount
183 #self.cost = 0
184 #self.trades = 0
185 self.terminal = False
186 self.portfolio_return_memory = [0]
187 self.actions_memory=[[1/self.stock_dim]*self.stock_dim]
188 self.date_memory=[self.data.date.unique()[0]]
189 return self.state
190
191 def render(self, mode='human'):
192 return self.state
193
194 def save_asset_memory(self):
195 date_list = self.date_memory
196 portfolio_return = self.portfolio_return_memory
197 #print(len(date_list))
198 #print(len(asset_list))
199 df_account_value = pd.DataFrame({'date':date_list,'daily_return':portfolio_return})
200 return df_account_value
201
202 def save_action_memory(self):
203 # date and close price length must match actions length
204 date_list = self.date_memory
205 df_date = pd.DataFrame(date_list)
206 df_date.columns = ['date']
207
208 action_list = self.actions_memory
209 df_actions = pd.DataFrame(action_list)
210 df_actions.columns = self.data.tic.values
211 df_actions.index = df_date.date
212 #df_actions = pd.DataFrame({'date':date_list,'actions':action_list})
213 return df_actions
214
215 def _seed(self, seed=None):
216 self.np_random, seed = seeding.np_random(seed)
217 return [seed]
实现 DRL 算法¶
FinRL 使用 DRLAgent 类来实现算法。
class DRLAgent:
"""
Provides implementations for DRL algorithms
Attributes
----------
env: gym environment class
user-defined class
Methods
-------
train_PPO()
the implementation for PPO algorithm
train_A2C()
the implementation for A2C algorithm
train_DDPG()
the implementation for DDPG algorithm
train_TD3()
the implementation for TD3 algorithm
DRL_prediction()
make a prediction in a test dataset and get results
"""
模型训练:
我们使用 A2C 进行投资组合配置,因为它稳定、成本效益高、速度更快,并且在大批量处理时表现更好。
交易:假设我们在2019年1月1日拥有1,000,000美元的初始资本。我们使用 A2C 模型对道琼斯30只股票进行投资组合配置。
1 trade = data_split(df,'2019-01-01', '2020-12-01')
2
3 env_trade, obs_trade = env_setup.create_env_trading(data = trade,
4 env_class = StockPortfolioEnv)
5
6 df_daily_return, df_actions = DRLAgent.DRL_prediction(model=model_a2c,
7 test_data = trade,
8 test_env = env_trade,
9 test_obs = obs_trade)

输出的动作或投资组合权重如下所示:

回测性能¶
FinRL 使用一组函数通过 Quantopian pyfolio 进行回测。
1 from pyfolio import timeseries
2 DRL_strat = backtest_strat(df_daily_return)
3 perf_func = timeseries.perf_stats
4 perf_stats_all = perf_func( returns=DRL_strat,
5 factor_returns=DRL_strat,
6 positions=None, transactions=None, turnover_denom="AGB")
7 print("==============DRL Strategy Stats===========")
8 perf_stats_all
9 print("==============Get Index Stats===========")
10 baesline_perf_stats=BaselineStats('^DJI',
11 baseline_start = '2019-01-01',
12 baseline_end = '2020-12-01')
13
14
15 # plot
16 dji, dow_strat = baseline_strat('^DJI','2019-01-01','2020-12-01')
17 import pyfolio
18 %matplotlib inline
19 with pyfolio.plotting.plotting_context(font_scale=1.1):
20 pyfolio.create_full_tear_sheet(returns = DRL_strat,
21 benchmark_rets=dow_strat, set_context=False)
左边的表格是回测性能统计数据,右边的表格是指数 (DJIA) 性能统计数据。
图表: