投资组合配置

我们的论文FinRL:一个用于量化金融中自动化股票交易的深度强化学习库

在 NeurIPS 2020:深度强化学习研讨会上展示。

Jupyter notebook 代码可在我们的 GithubGoogle Colab 上获取。

提示

查看我们之前的教程:单只股票交易多只股票交易,以详细了解 FinRL 的架构和模块。

概述

首先,我们想解释使用深度强化学习进行投资组合配置的逻辑。我们在本文中始终使用道琼斯30指数成分股作为示例,因为它们是最受欢迎的股票。

假设我们在2019年初拥有100万美元。我们希望将这1,000,000美元投资于股票市场,在本例中是道琼斯30指数成分股。假设没有保证金、没有卖空、没有国库券(将所有资金仅用于交易这30只股票)。这样每只个股的权重都是非负的,并且所有股票的权重加起来等于1。

我们聘请了一位智能的投资组合经理——深度强化学习先生。DRL先生将每天给我们建议,包括投资组合权重或投资这30只股票的资金比例。因此,每天我们只需重新平衡股票的投资组合权重。基本逻辑如下。

tutorial/image/portfolio_allocation_1.png

投资组合配置与多只股票交易不同,因为我们在每个时间步都在本质上重新平衡权重,并且我们必须使用所有可用的资金。

进行投资组合配置的传统和最流行的方法是均值-方差或现代投资组合理论(MPT)

image/portfolio_allocation_2.png

然而,MPT 在样本外数据中表现不佳。MPT 仅基于股票回报计算,如果我们要考虑其他相关因素,例如一些技术指标如 MACD 或 RSI,MPT 可能无法很好地将这些信息结合在一起。

我们引入了一个 DRL 库 FinRL,它方便初学者接触量化金融。FinRL 是一个专门为自动化股票交易设计的 DRL 库,旨在用于教育和演示目的。

本文重点介绍我们论文中的一个用例:投资组合配置。我们使用一个 Jupyter notebook 包含了所有必要的步骤。

问题定义

本问题是设计一个用于投资组合配置的自动化交易解决方案。我们将股票交易过程建模为一个马尔可夫决策过程(MDP)。然后我们将交易目标表述为一个最大化问题。

强化学习环境的组成部分是:

  • 动作:每只股票的投资组合权重在 [0,1] 范围内。我们使用 softmax 函数将动作归一化,使其总和为1。

  • 状态: {协方差矩阵, MACD, RSI, CCI, ADX},状态空间形状为 (34, 30)。34 是行数,30 是列数。

  • 奖励函数:r(s, a, s′) = p_t,p_t 是累积投资组合价值。

  • 环境:道琼斯30指数成分股的投资组合配置。

协方差矩阵是一个很好的特征,因为投资组合经理用它来量化与特定投资组合相关的风险(标准差)。

我们还假设没有交易成本,因为我们试图将一个简单的投资组合配置案例作为一个起点。

加载 Python 包

安装 FinRL 的不稳定开发版本

1 # Install the unstable development version in Jupyter notebook:
2 !pip install git+https://github.com/AI4Finance-LLC/FinRL-Library.git

导入包

 1 # import packages
 2 import pandas as pd
 3 import numpy as np
 4 import matplotlib
 5 import matplotlib.pyplot as plt
 6 matplotlib.use('Agg')
 7 import datetime
 8
 9 from finrl import config
10 from finrl import config_tickers
11 from finrl.marketdata.yahoodownloader import YahooDownloader
12 from finrl.preprocessing.preprocessors import FeatureEngineer
13 from finrl.preprocessing.data import data_split
14 from finrl.env.environment import EnvSetup
15 from finrl.env.EnvMultipleStock_train import StockEnvTrain
16 from finrl.env.EnvMultipleStock_trade import StockEnvTrade
17 from finrl.model.models import DRLAgent
18 from finrl.trade.backtest import BackTestStats, BaselineStats, BackTestPlot, backtest_strat, baseline_strat
19 from finrl.trade.backtest import backtest_strat, baseline_strat
20
21 import os
22 if not os.path.exists("./" + config.DATA_SAVE_DIR):
23     os.makedirs("./" + config.DATA_SAVE_DIR)
24 if not os.path.exists("./" + config.TRAINED_MODEL_DIR):
25     os.makedirs("./" + config.TRAINED_MODEL_DIR)
26 if not os.path.exists("./" + config.TENSORBOARD_LOG_DIR):
27     os.makedirs("./" + config.TENSORBOARD_LOG_DIR)
28 if not os.path.exists("./" + config.RESULTS_DIR):
29     os.makedirs("./" + config.RESULTS_DIR)

下载数据

FinRL 使用 YahooDownloader 类来提取数据。

class YahooDownloader:
    """
    Provides methods for retrieving daily stock data from Yahoo Finance API

    Attributes
    ----------
        start_date : str
            start date of the data (modified from config.py)
        end_date : str
            end date of the data (modified from config.py)
        ticker_list : list
            a list of stock tickers (modified from config.py)

    Methods
    -------
        fetch_data()
            Fetches data from yahoo API
    """

下载数据并保存到 pandas DataFrame 中

1 # Download and save the data in a pandas DataFrame:
2 df = YahooDownloader(start_date = '2008-01-01',
3                      end_date = '2020-12-01',
4                      ticker_list = config_tickers.DOW_30_TICKER).fetch_data()

数据预处理

FinRL 使用 FeatureEngineer 类来预处理数据。

class FeatureEngineer:
    """
    Provides methods for preprocessing the stock price data

    Attributes
    ----------
        df: DataFrame
            data downloaded from Yahoo API
        feature_number : int
            number of features we used
        use_technical_indicator : boolean
            we technical indicator or not
        use_turbulence : boolean
            use turbulence index or not

    Methods
    -------
        preprocess_data()
            main method to do the feature engineering
    """

执行特征工程:协方差矩阵 + 技术指标

 1 # Perform Feature Engineering:
 2 df = FeatureEngineer(df.copy(),
 3                     use_technical_indicator=True,
 4                     use_turbulence=False).preprocess_data()
 5
 6
 7 # add covariance matrix as states
 8 df=df.sort_values(['date','tic'],ignore_index=True)
 9 df.index = df.date.factorize()[0]
10
11 cov_list = []
12 # look back is one year
13 lookback=252
14 for i in range(lookback,len(df.index.unique())):
15   data_lookback = df.loc[i-lookback:i,:]
16   price_lookback=data_lookback.pivot_table(index = 'date',columns = 'tic', values = 'close')
17   return_lookback = price_lookback.pct_change().dropna()
18   covs = return_lookback.cov().values
19   cov_list.append(covs)
20
21 df_cov = pd.DataFrame({'date':df.date.unique()[lookback:],'cov_list':cov_list})
22 df = df.merge(df_cov, on='date')
23 df = df.sort_values(['date','tic']).reset_index(drop=True)
24 df.head()
image/portfolio_allocation_3.png

构建环境

FinRL 使用 EnvSetup 类来设置环境。

class EnvSetup:
    """
    Provides methods for retrieving daily stock data from
    Yahoo Finance API

    Attributes
        ----------
        stock_dim: int
            number of unique stocks
        hmax : int
            maximum number of shares to trade
        initial_amount: int
            start money
        transaction_cost_pct : float
            transaction cost percentage per trade
        reward_scaling: float
            scaling factor for reward, good for training
        tech_indicator_list: list
            a list of technical indicator names (modified from config.py)
    Methods
        -------
        create_env_training()
            create env class for training
        create_env_validation()
            create env class for validation
        create_env_trading()
            create env class for trading
    """

初始化一个环境类

用户定义环境:一个模拟环境类。用于投资组合配置的环境

  1 import numpy as np
  2 import pandas as pd
  3 from gym.utils import seeding
  4 import gym
  5 from gym import spaces
  6 import matplotlib
  7 matplotlib.use('Agg')
  8 import matplotlib.pyplot as plt
  9
 10 class StockPortfolioEnv(gym.Env):
 11     """A single stock trading environment for OpenAI gym
 12     Attributes
 13     ----------
 14         df: DataFrame
 15             input data
 16         stock_dim : int
 17             number of unique stocks
 18         hmax : int
 19             maximum number of shares to trade
 20         initial_amount : int
 21             start money
 22         transaction_cost_pct: float
 23             transaction cost percentage per trade
 24         reward_scaling: float
 25             scaling factor for reward, good for training
 26         state_space: int
 27             the dimension of input features
 28         action_space: int
 29             equals stock dimension
 30         tech_indicator_list: list
 31             a list of technical indicator names
 32         turbulence_threshold: int
 33             a threshold to control risk aversion
 34         day: int
 35             an increment number to control date
 36     Methods
 37     -------
 38     _sell_stock()
 39         perform sell action based on the sign of the action
 40     _buy_stock()
 41         perform buy action based on the sign of the action
 42     step()
 43         at each step the agent will return actions, then
 44         we will calculate the reward, and return the next observation.
 45     reset()
 46         reset the environment
 47     render()
 48         use render to return other functions
 49     save_asset_memory()
 50         return account value at each time step
 51     save_action_memory()
 52         return actions/positions at each time step
 53
 54     """
 55     metadata = {'render.modes': ['human']}
 56
 57     def __init__(self,
 58                 df,
 59                 stock_dim,
 60                 hmax,
 61                 initial_amount,
 62                 transaction_cost_pct,
 63                 reward_scaling,
 64                 state_space,
 65                 action_space,
 66                 tech_indicator_list,
 67                 turbulence_threshold,
 68                 lookback=252,
 69                 day = 0):
 70         #super(StockEnv, self).__init__()
 71         #money = 10 , scope = 1
 72         self.day = day
 73         self.lookback=lookback
 74         self.df = df
 75         self.stock_dim = stock_dim
 76         self.hmax = hmax
 77         self.initial_amount = initial_amount
 78         self.transaction_cost_pct =transaction_cost_pct
 79         self.reward_scaling = reward_scaling
 80         self.state_space = state_space
 81         self.action_space = action_space
 82         self.tech_indicator_list = tech_indicator_list
 83
 84         # action_space normalization and shape is self.stock_dim
 85         self.action_space = spaces.Box(low = 0, high = 1,shape = (self.action_space,))
 86         # Shape = (34, 30)
 87         # covariance matrix + technical indicators
 88         self.observation_space = spaces.Box(low=0,
 89                                             high=np.inf,
 90                                             shape = (self.state_space+len(self.tech_indicator_list),
 91                                                      self.state_space))
 92
 93         # load data from a pandas dataframe
 94         self.data = self.df.loc[self.day,:]
 95         self.covs = self.data['cov_list'].values[0]
 96         self.state =  np.append(np.array(self.covs),
 97                       [self.data[tech].values.tolist() for tech in self.tech_indicator_list ], axis=0)
 98         self.terminal = False
 99         self.turbulence_threshold = turbulence_threshold
100         # initalize state: inital portfolio return + individual stock return + individual weights
101         self.portfolio_value = self.initial_amount
102
103         # memorize portfolio value each step
104         self.asset_memory = [self.initial_amount]
105         # memorize portfolio return each step
106         self.portfolio_return_memory = [0]
107         self.actions_memory=[[1/self.stock_dim]*self.stock_dim]
108         self.date_memory=[self.data.date.unique()[0]]
109
110
111     def step(self, actions):
112         # print(self.day)
113         self.terminal = self.day >= len(self.df.index.unique())-1
114         # print(actions)
115
116         if self.terminal:
117             df = pd.DataFrame(self.portfolio_return_memory)
118             df.columns = ['daily_return']
119             plt.plot(df.daily_return.cumsum(),'r')
120             plt.savefig('results/cumulative_reward.png')
121             plt.close()
122
123             plt.plot(self.portfolio_return_memory,'r')
124             plt.savefig('results/rewards.png')
125             plt.close()
126
127             print("=================================")
128             print("begin_total_asset:{}".format(self.asset_memory[0]))
129             print("end_total_asset:{}".format(self.portfolio_value))
130
131             df_daily_return = pd.DataFrame(self.portfolio_return_memory)
132             df_daily_return.columns = ['daily_return']
133             if df_daily_return['daily_return'].std() !=0:
134               sharpe = (252**0.5)*df_daily_return['daily_return'].mean()/ \
135                        df_daily_return['daily_return'].std()
136               print("Sharpe: ",sharpe)
137             print("=================================")
138
139             return self.state, self.reward, self.terminal,{}
140
141         else:
142             #print(actions)
143             # actions are the portfolio weight
144             # normalize to sum of 1
145             norm_actions = (np.array(actions) - np.array(actions).min()) / (np.array(actions) - np.array(actions).min()).sum()
146             weights = norm_actions
147             #print(weights)
148             self.actions_memory.append(weights)
149             last_day_memory = self.data
150
151             #load next state
152             self.day += 1
153             self.data = self.df.loc[self.day,:]
154             self.covs = self.data['cov_list'].values[0]
155             self.state =  np.append(np.array(self.covs), [self.data[tech].values.tolist() for tech in self.tech_indicator_list ], axis=0)
156             # calcualte portfolio return
157             # individual stocks' return * weight
158             portfolio_return = sum(((self.data.close.values / last_day_memory.close.values)-1)*weights)
159             # update portfolio value
160             new_portfolio_value = self.portfolio_value*(1+portfolio_return)
161             self.portfolio_value = new_portfolio_value
162
163             # save into memory
164             self.portfolio_return_memory.append(portfolio_return)
165             self.date_memory.append(self.data.date.unique()[0])
166             self.asset_memory.append(new_portfolio_value)
167
168             # the reward is the new portfolio value or end portfolo value
169             self.reward = new_portfolio_value
170             #self.reward = self.reward*self.reward_scaling
171
172
173         return self.state, self.reward, self.terminal, {}
174
175     def reset(self):
176         self.asset_memory = [self.initial_amount]
177         self.day = 0
178         self.data = self.df.loc[self.day,:]
179         # load states
180         self.covs = self.data['cov_list'].values[0]
181         self.state =  np.append(np.array(self.covs), [self.data[tech].values.tolist() for tech in self.tech_indicator_list ], axis=0)
182         self.portfolio_value = self.initial_amount
183         #self.cost = 0
184         #self.trades = 0
185         self.terminal = False
186         self.portfolio_return_memory = [0]
187         self.actions_memory=[[1/self.stock_dim]*self.stock_dim]
188         self.date_memory=[self.data.date.unique()[0]]
189         return self.state
190
191     def render(self, mode='human'):
192         return self.state
193
194     def save_asset_memory(self):
195         date_list = self.date_memory
196         portfolio_return = self.portfolio_return_memory
197         #print(len(date_list))
198         #print(len(asset_list))
199         df_account_value = pd.DataFrame({'date':date_list,'daily_return':portfolio_return})
200         return df_account_value
201
202     def save_action_memory(self):
203         # date and close price length must match actions length
204         date_list = self.date_memory
205         df_date = pd.DataFrame(date_list)
206         df_date.columns = ['date']
207
208         action_list = self.actions_memory
209         df_actions = pd.DataFrame(action_list)
210         df_actions.columns = self.data.tic.values
211         df_actions.index = df_date.date
212         #df_actions = pd.DataFrame({'date':date_list,'actions':action_list})
213         return df_actions
214
215     def _seed(self, seed=None):
216         self.np_random, seed = seeding.np_random(seed)
217         return [seed]

实现 DRL 算法

FinRL 使用 DRLAgent 类来实现算法。

class DRLAgent:
    """
    Provides implementations for DRL algorithms

    Attributes
    ----------
        env: gym environment class
             user-defined class
    Methods
    -------
        train_PPO()
            the implementation for PPO algorithm
        train_A2C()
            the implementation for A2C algorithm
        train_DDPG()
            the implementation for DDPG algorithm
        train_TD3()
            the implementation for TD3 algorithm
        DRL_prediction()
            make a prediction in a test dataset and get results
    """

模型训练:

我们使用 A2C 进行投资组合配置,因为它稳定、成本效益高、速度更快,并且在大批量处理时表现更好。

交易:假设我们在2019年1月1日拥有1,000,000美元的初始资本。我们使用 A2C 模型对道琼斯30只股票进行投资组合配置。

1 trade = data_split(df,'2019-01-01', '2020-12-01')
2
3 env_trade, obs_trade = env_setup.create_env_trading(data = trade,
4                                          env_class = StockPortfolioEnv)
5
6 df_daily_return, df_actions = DRLAgent.DRL_prediction(model=model_a2c,
7                         test_data = trade,
8                         test_env = env_trade,
9                         test_obs = obs_trade)
image/portfolio_allocation_4.png

输出的动作或投资组合权重如下所示:

image/portfolio_allocation_5.png

回测性能

FinRL 使用一组函数通过 Quantopian pyfolio 进行回测。

 1 from pyfolio import timeseries
 2 DRL_strat = backtest_strat(df_daily_return)
 3 perf_func = timeseries.perf_stats
 4 perf_stats_all = perf_func( returns=DRL_strat,
 5                               factor_returns=DRL_strat,
 6                                 positions=None, transactions=None, turnover_denom="AGB")
 7 print("==============DRL Strategy Stats===========")
 8 perf_stats_all
 9 print("==============Get Index Stats===========")
10 baesline_perf_stats=BaselineStats('^DJI',
11                                   baseline_start = '2019-01-01',
12                                   baseline_end = '2020-12-01')
13
14
15 # plot
16 dji, dow_strat = baseline_strat('^DJI','2019-01-01','2020-12-01')
17 import pyfolio
18 %matplotlib inline
19 with pyfolio.plotting.plotting_context(font_scale=1.1):
20         pyfolio.create_full_tear_sheet(returns = DRL_strat,
21                                        benchmark_rets=dow_strat, set_context=False)

左边的表格是回测性能统计数据,右边的表格是指数 (DJIA) 性能统计数据。

图表: