MarketFlow

MarketFlow transforms financial market data into machine learning models for making market predictions. The platform gets stock price data from Yahoo Finance (end-of-day) and Google Finance (intraday), transforming the data into canonical form for training and testing. MarketFlow is powerful because you can easily apply new features to groups of stocks simultaneously using our Variable Definition Language (VDL). All of the dataframes are aggregated and split into training and testing files for input into AlphaPy.

Market Pipeline

Data Sources

MarketFlow gets daily stock prices from Yahoo Finance and intraday stock prices from Google Finance. Both data sources have the standard primitives: Open, High, Low, Close, and Volume. For daily data, there is a Date timestamp and for intraday data, there is a Datetime timestamp. We augment the intraday data with a bar_number field to mark the end of the trading day. All trading days do not end at 4:00 pm EST, as there are holiday trading days that are shortened.

Amazon Daily Stock Prices (Source: Yahoo)
Date Open High Low Close Volume
2017-03-01 853.05 854.83 849.01 853.08 2752000
2017-03-02 853.08 854.82 847.28 848.91 2129200
2017-03-03 847.20 851.99 846.27 849.88 1941100
2017-03-06 845.23 848.49 841.12 846.61 2598400
2017-03-07 845.48 848.46 843.75 846.02 2217800
2017-03-08 848.00 853.07 846.79 850.50 2286500
2017-03-09 851.00 856.40 850.31 853.00 2040600
2017-03-10 857.00 857.35 851.72 852.46 2422000
2017-03-13 851.77 855.69 851.71 854.59 1906100
2017-03-14 853.55 853.75 847.55 852.53 2128700
2017-03-15 854.33 854.45 847.11 852.97 2556700
2017-03-16 855.30 855.50 850.51 853.42 1832600
2017-03-17 853.49 853.83 850.64 852.31 3380700
2017-03-20 851.51 857.80 851.01 856.97 2223300
2017-03-21 858.84 862.80 841.31 843.20 4349100
2017-03-22 840.43 849.37 839.05 848.06 2636200
2017-03-23 848.20 850.89 844.80 847.38 1945700
2017-03-24 851.68 851.80 843.53 845.61 2118300
2017-03-27 838.07 850.30 833.50 846.82 2754200
2017-03-28 851.75 858.46 850.10 856.00 3033000
2017-03-29 859.05 876.44 859.02 874.32 4464400
2017-03-30 874.95 877.06 871.66 876.34 2745800
2017-03-31 877.00 890.35 876.65 886.54 3910700

Note

Normal market hours are 9:30 am to 4:00 pm EST. Here, we retrieved the data from the CST time zone, one hour ahead.

Amazon Intraday Stock Prices (Source: Google)
datetime open high low close volume bar_number end_of_day
2017-03-31 08:30:00 877.00 877.06 876.66 877.06 43910 0 False
2017-03-31 08:40:00 877.00 881.74 876.65 880.66 192152 1 False
2017-03-31 08:50:00 881.00 886.01 880.45 884.67 249892 2 False
2017-03-31 09:00:00 884.83 886.08 883.50 883.90 148034 3 False
2017-03-31 09:10:00 883.90 886.44 883.84 885.72 118518 4 False
2017-03-31 09:20:00 885.82 886.39 884.68 886.30 76880 5 False
2017-03-31 09:30:00 886.14 886.74 885.07 885.73 74180 6 False
2017-03-31 09:40:00 885.80 886.20 885.13 886.20 77154 7 False
2017-03-31 09:50:00 886.21 887.61 885.77 887.51 86971 8 False
2017-03-31 10:00:00 887.59 888.35 886.83 887.81 111998 9 False
2017-03-31 10:10:00 887.80 888.72 887.59 888.60 64497 10 False
2017-03-31 10:20:00 888.62 890.35 888.44 889.82 101562 11 False
2017-03-31 10:30:00 889.81 889.96 888.83 889.83 42580 12 False
2017-03-31 10:40:00 889.70 889.92 887.32 887.61 88559 13 False
2017-03-31 10:50:00 887.68 889.58 887.66 889.01 45492 14 False
2017-03-31 11:00:00 889.12 889.26 887.25 888.34 39841 15 False
2017-03-31 11:10:00 888.52 889.00 887.66 887.66 24525 16 False
2017-03-31 11:20:00 887.83 888.74 887.67 888.14 35031 17 False
2017-03-31 11:30:00 888.14 888.87 888.10 888.87 24460 18 False
2017-03-31 11:40:00 888.91 889.47 888.76 888.96 38921 19 False
2017-03-31 11:50:00 888.87 889.06 888.47 888.89 35439 20 False
2017-03-31 12:00:00 888.86 889.50 888.82 889.40 25933 21 False
2017-03-31 12:10:00 889.40 889.94 889.35 889.86 35120 22 False
2017-03-31 12:20:00 889.90 890.25 889.51 889.57 38429 23 False
2017-03-31 12:30:00 889.70 889.79 889.20 889.79 18435 24 False
2017-03-31 12:40:00 889.80 889.93 889.28 889.50 25481 25 False
2017-03-31 12:50:00 889.60 889.99 889.50 889.77 26536 26 False
2017-03-31 13:00:00 889.70 889.73 888.33 888.49 35556 27 False
2017-03-31 13:10:00 888.31 888.64 887.80 888.45 39215 28 False
2017-03-31 13:20:00 888.58 888.58 887.09 887.15 43771 29 False
2017-03-31 13:30:00 887.18 888.40 887.05 888.13 36830 30 False
2017-03-31 13:40:00 888.22 888.99 887.68 888.78 29510 31 False
2017-03-31 13:50:00 888.79 888.99 888.35 888.58 30370 32 False
2017-03-31 14:00:00 888.66 888.82 887.02 887.02 48011 33 False
2017-03-31 14:10:00 887.02 888.30 886.80 888.15 41046 34 False
2017-03-31 14:20:00 888.14 889.00 888.06 888.55 38660 35 False
2017-03-31 14:30:00 888.58 888.83 888.30 888.40 39304 36 False
2017-03-31 14:40:00 888.40 888.60 888.01 888.45 57289 37 False
2017-03-31 14:50:00 888.39 889.32 888.16 888.17 105594 38 False
2017-03-31 15:00:00 888.43 888.54 886.54 886.94 518134 39 True

Note

You can get Google intraday data going back a maximum of 50 days. If you want to build your own historical record, then we recommend that you save the data on an ongoing basis for a a larger backtesting window.

Domain Configuration

The market configuration file (market.yml) is written in YAML and is divided into logical sections reflecting different parts of MarketFlow. This file is stored in the config directory of your project, along with the model.yml and algos.yml files. The market section has the following parameters:

data_history:
Number of periods of historical data to retrieve.
forecast_period:
Number of periods to forecast for the target variable.
fractal:
The time quantum for the data feed, represented by an integer followed by a character code. The string “1d” is one day, and “5m” is five minutes.
leaders:
A list of features that are coincident with the target variable. For example, with daily stock market data, the Open is considered to be a leader because it is recorded at the market open. In contrast, the daily High or Low cannot be known until the the market close.
predict_history:
This is the minimum number of periods required to derive all of the features in prediction mode on a given date. If you use a rolling mean of 50 days, then the predict_history should be set to at least 50 to have a valid value on the prediction date.
schema:
This string uniquely identifies the subject matter of the data. A schema could be prices for identifying market data.
target_group:
The name of the group selected from the groups section, e.g., a set of stock symbols.
market.yml
market:
    data_history    : 2000
    forecast_period : 1
    fractal         : 1d
    leaders         : ['gap', 'gapbadown', 'gapbaup', 'gapdown', 'gapup']
    predict_history : 100
    schema          : prices
    target_group    : test

groups:
    all  : ['aaoi', 'aapl', 'acia', 'adbe', 'adi', 'adp', 'agn', 'aig', 'akam',
            'algn', 'alk', 'alxn', 'amat', 'amba', 'amd', 'amgn', 'amt', 'amzn',
            'antm', 'arch', 'asml', 'athn', 'atvi', 'auph', 'avgo', 'axp', 'ayx',
            'azo', 'ba', 'baba', 'bac', 'bby', 'bidu', 'biib', 'brcd', 'bvsn',
            'bwld', 'c', 'cacc', 'cara', 'casy', 'cat', 'cde', 'celg', 'cern',
            'chkp', 'chtr', 'clvs', 'cme', 'cmg', 'cof', 'cohr', 'comm', 'cost',
            'cpk', 'crm', 'crus', 'csco', 'ctsh', 'ctxs', 'csx', 'cvs', 'cybr',
            'data', 'ddd', 'deck', 'dgaz', 'dia', 'dis', 'dish', 'dnkn', 'dpz',
            'drys', 'dust', 'ea', 'ebay', 'edc', 'edz', 'eem', 'elli', 'eog',
            'esrx', 'etrm', 'ewh', 'ewt', 'expe', 'fang', 'fas', 'faz', 'fb',
            'fcx', 'fdx', 'ffiv', 'fit', 'five', 'fnsr', 'fslr', 'ftnt', 'gddy',
            'gdx', 'gdxj', 'ge', 'gild', 'gld', 'glw', 'gm', 'googl', 'gpro',
            'grub', 'gs', 'gwph', 'hal', 'has', 'hd', 'hdp', 'hlf', 'hog', 'hum',
            'ibb', 'ibm', 'ice', 'idxx', 'ilmn', 'ilmn', 'incy', 'intc', 'intu',
            'ip', 'isrg', 'iwm', 'ivv', 'iwf', 'iwm', 'jack', 'jcp', 'jdst', 'jnj',
            'jnpr', 'jnug', 'jpm', 'kite', 'klac', 'ko', 'kss', 'labd', 'labu',
            'len', 'lite', 'lmt', 'lnkd', 'lrcx', 'lulu', 'lvs', 'mbly', 'mcd',
            'mchp', 'mdy', 'meoh', 'mnst', 'mo', 'momo', 'mon', 'mrk', 'ms', 'msft',
            'mtb', 'mu', 'nflx', 'nfx', 'nke', 'ntap', 'ntes', 'ntnx', 'nugt',
            'nvda', 'nxpi', 'nxst', 'oii', 'oled', 'orcl', 'orly', 'p', 'panw',
            'pcln', 'pg', 'pm', 'pnra', 'prgo', 'pxd', 'pypl', 'qcom', 'qqq',
            'qrvo', 'rht', 'sam', 'sbux', 'sds', 'sgen', 'shld', 'shop', 'sig',
            'sina', 'siri', 'skx', 'slb', 'slv', 'smh', 'snap', 'sncr', 'soda',
            'splk', 'spy', 'stld', 'stmp', 'stx', 'svxy', 'swks', 'symc', 't',
            'tbt', 'teva', 'tgt', 'tho', 'tlt', 'tmo', 'tna', 'tqqq', 'trip',
            'tsla', 'ttwo', 'tvix', 'twlo', 'twtr', 'tza', 'uaa', 'ugaz', 'uhs',
            'ulta', 'ulti', 'unh', 'unp', 'upro', 'uri', 'ups', 'uri', 'uthr',
            'utx', 'uvxy', 'v', 'veev', 'viav', 'vlo', 'vmc', 'vrsn', 'vrtx', 'vrx',
            'vwo', 'vxx', 'vz', 'wday', 'wdc', 'wfc', 'wfm', 'wmt', 'wynn', 'x',
            'xbi', 'xhb', 'xiv', 'xle', 'xlf', 'xlk', 'xlnx', 'xom', 'xlp', 'xlu',
            'xlv', 'xme', 'xom', 'wix', 'yelp', 'z']
    etf  : ['dia', 'dust', 'edc', 'edz', 'eem', 'ewh', 'ewt', 'fas', 'faz',
            'gld', 'hyg', 'iwm', 'ivv', 'iwf', 'jnk', 'mdy', 'nugt', 'qqq',
            'sds', 'smh', 'spy', 'tbt', 'tlt', 'tna', 'tvix', 'tza', 'upro',
            'uvxy', 'vwo', 'vxx', 'xhb', 'xiv', 'xle', 'xlf', 'xlk', 'xlp',
            'xlu', 'xlv', 'xme']
    tech : ['aapl', 'adbe', 'amat', 'amgn', 'amzn', 'avgo', 'baba', 'bidu',
            'brcd', 'csco', 'ddd', 'emc', 'expe', 'fb', 'fit', 'fslr', 'goog',
            'intc', 'isrg', 'lnkd', 'msft', 'nflx', 'nvda', 'pcln', 'qcom',
            'qqq', 'tsla', 'twtr']
    test : ['aapl', 'amzn', 'goog', 'fb', 'nvda', 'tsla']

features: ['abovema_3', 'abovema_5', 'abovema_10', 'abovema_20', 'abovema_50',
           'adx', 'atr', 'bigdown', 'bigup', 'diminus', 'diplus', 'doji',
           'gap', 'gapbadown', 'gapbaup', 'gapdown', 'gapup',
           'hc', 'hh', 'ho', 'hl', 'lc', 'lh', 'll', 'lo', 'hookdown', 'hookup',
           'inside', 'outside', 'madelta_3', 'madelta_5', 'madelta_7', 'madelta_10',
           'madelta_12', 'madelta_15', 'madelta_18', 'madelta_20', 'madelta',
           'net', 'netdown', 'netup', 'nr_3', 'nr_4', 'nr_5', 'nr_7', 'nr_8',
           'nr_10', 'nr_18', 'roi', 'roi_2', 'roi_3', 'roi_4', 'roi_5', 'roi_10',
           'roi_20', 'rr_1_4', 'rr_1_7', 'rr_1_10', 'rr_2_5', 'rr_2_7', 'rr_2_10',
           'rr_3_8', 'rr_3_14', 'rr_4_10', 'rr_4_20', 'rr_5_10', 'rr_5_20',
           'rr_5_30', 'rr_6_14', 'rr_6_25', 'rr_7_14', 'rr_7_35', 'rr_8_22',
           'rrhigh', 'rrlow', 'rrover', 'rrunder', 'rsi_3', 'rsi_4', 'rsi_5',
           'rsi_6', 'rsi_8', 'rsi_10', 'rsi_14', 'sep_3_3', 'sep_5_5', 'sep_8_8',
           'sep_10_10', 'sep_14_14', 'sep_21_21', 'sep_30_30', 'sep_40_40',
           'sephigh', 'seplow', 'trend', 'vma', 'vmover', 'vmratio', 'vmunder',
           'volatility_3', 'volatility_5', 'volatility', 'volatility_20',
           'wr_2', 'wr_3', 'wr', 'wr_5', 'wr_6', 'wr_7', 'wr_10']

aliases:
    atr        : 'ma_truerange'
    aver       : 'ma_hlrange'
    cma        : 'ma_close'
    cmax       : 'highest_close'
    cmin       : 'lowest_close'
    hc         : 'higher_close'
    hh         : 'higher_high'
    hl         : 'higher_low'
    ho         : 'higher_open'
    hmax       : 'highest_high'
    hmin       : 'lowest_high'
    lc         : 'lower_close'
    lh         : 'lower_high'
    ll         : 'lower_low'
    lo         : 'lower_open'
    lmax       : 'highest_low'
    lmin       : 'lowest_low'
    net        : 'net_close'
    netdown    : 'down_net'
    netup      : 'up_net'
    omax       : 'highest_open'
    omin       : 'lowest_open'
    rmax       : 'highest_hlrange'
    rmin       : 'lowest_hlrange'
    rr         : 'maratio_hlrange'
    rixc       : 'rindex_close_high_low'
    rixo       : 'rindex_open_high_low'
    roi        : 'netreturn_close'
    rsi        : 'rsi_close'
    sepma      : 'ma_sep'
    vma        : 'ma_volume'
    vmratio    : 'maratio_volume'
    upmove     : 'net_high'

variables:
    abovema    : 'close > cma_50'
    belowma    : 'close < cma_50'
    bigup      : 'rrover & sephigh & netup'
    bigdown    : 'rrover & sephigh & netdown'
    doji       : 'sepdoji & rrunder'
    hookdown   : 'open > high[1] & close < close[1]'
    hookup     : 'open < low[1] & close > close[1]'
    inside     : 'low > low[1] & high < high[1]'
    madelta    : '(close - cma_50) / atr_10'
    nr         : 'hlrange == rmin_4'
    outside    : 'low < low[1] & high > high[1]'
    roihigh    : 'roi_5 >= 5'
    roilow     : 'roi_5 < -5'
    roiminus   : 'roi_5 < 0'
    roiplus    : 'roi_5 > 0'
    rrhigh     : 'rr_1_10 >= 1.2'
    rrlow      : 'rr_1_10 <= 0.8'
    rrover     : 'rr_1_10 >= 1.0'
    rrunder    : 'rr_1_10 < 1.0'
    sep        : 'rixc_1 - rixo_1'
    sepdoji    : 'abs(sep) <= 15'
    sephigh    : 'abs(sep_1_1) >= 70'
    seplow     : 'abs(sep_1_1) <= 30'
    trend      : 'rrover & sephigh'
    vmover     : 'vmratio >= 1'
    vmunder    : 'vmratio < 1'
    volatility : 'atr_10 / close'
    wr         : 'hlrange == rmax_4'

Group Analysis

The cornerstone of MarketFlow is the Analysis. You can create models and forecasts for different groups of stocks. The purpose of the analysis object is to gather data for all of the group members and then consolidate the data into train and test files. Further, some features and the target variable have to be adjusted (lagged) to avoid data leakage.

A group is simply a collection of symbols for analysis. In this example, we create different groups for technology stocks, ETFs, and a smaller group for testing. To create a model for a given group, simply set the target_group in the market section of the market.yml file and run mflow.

market.yml
groups:
    all  : ['aaoi', 'aapl', 'acia', 'adbe', 'adi', 'adp', 'agn', 'aig', 'akam',
            'algn', 'alk', 'alxn', 'amat', 'amba', 'amd', 'amgn', 'amt', 'amzn',
            'antm', 'arch', 'asml', 'athn', 'atvi', 'auph', 'avgo', 'axp', 'ayx',
            'azo', 'ba', 'baba', 'bac', 'bby', 'bidu', 'biib', 'brcd', 'bvsn',
            'bwld', 'c', 'cacc', 'cara', 'casy', 'cat', 'cde', 'celg', 'cern',
            'chkp', 'chtr', 'clvs', 'cme', 'cmg', 'cof', 'cohr', 'comm', 'cost',
            'cpk', 'crm', 'crus', 'csco', 'ctsh', 'ctxs', 'csx', 'cvs', 'cybr',
            'data', 'ddd', 'deck', 'dgaz', 'dia', 'dis', 'dish', 'dnkn', 'dpz',
            'drys', 'dust', 'ea', 'ebay', 'edc', 'edz', 'eem', 'elli', 'eog',
            'esrx', 'etrm', 'ewh', 'ewt', 'expe', 'fang', 'fas', 'faz', 'fb',
            'fcx', 'fdx', 'ffiv', 'fit', 'five', 'fnsr', 'fslr', 'ftnt', 'gddy',
            'gdx', 'gdxj', 'ge', 'gild', 'gld', 'glw', 'gm', 'googl', 'gpro',
            'grub', 'gs', 'gwph', 'hal', 'has', 'hd', 'hdp', 'hlf', 'hog', 'hum',
            'ibb', 'ibm', 'ice', 'idxx', 'ilmn', 'ilmn', 'incy', 'intc', 'intu',
            'ip', 'isrg', 'iwm', 'ivv', 'iwf', 'iwm', 'jack', 'jcp', 'jdst', 'jnj',
            'jnpr', 'jnug', 'jpm', 'kite', 'klac', 'ko', 'kss', 'labd', 'labu',
            'len', 'lite', 'lmt', 'lnkd', 'lrcx', 'lulu', 'lvs', 'mbly', 'mcd',
            'mchp', 'mdy', 'meoh', 'mnst', 'mo', 'momo', 'mon', 'mrk', 'ms', 'msft',
            'mtb', 'mu', 'nflx', 'nfx', 'nke', 'ntap', 'ntes', 'ntnx', 'nugt',
            'nvda', 'nxpi', 'nxst', 'oii', 'oled', 'orcl', 'orly', 'p', 'panw',
            'pcln', 'pg', 'pm', 'pnra', 'prgo', 'pxd', 'pypl', 'qcom', 'qqq',
            'qrvo', 'rht', 'sam', 'sbux', 'sds', 'sgen', 'shld', 'shop', 'sig',
            'sina', 'siri', 'skx', 'slb', 'slv', 'smh', 'snap', 'sncr', 'soda',
            'splk', 'spy', 'stld', 'stmp', 'stx', 'svxy', 'swks', 'symc', 't',
            'tbt', 'teva', 'tgt', 'tho', 'tlt', 'tmo', 'tna', 'tqqq', 'trip',
            'tsla', 'ttwo', 'tvix', 'twlo', 'twtr', 'tza', 'uaa', 'ugaz', 'uhs',
            'ulta', 'ulti', 'unh', 'unp', 'upro', 'uri', 'ups', 'uri', 'uthr',
            'utx', 'uvxy', 'v', 'veev', 'viav', 'vlo', 'vmc', 'vrsn', 'vrtx', 'vrx',
            'vwo', 'vxx', 'vz', 'wday', 'wdc', 'wfc', 'wfm', 'wmt', 'wynn', 'x',
            'xbi', 'xhb', 'xiv', 'xle', 'xlf', 'xlk', 'xlnx', 'xom', 'xlp', 'xlu',
            'xlv', 'xme', 'xom', 'wix', 'yelp', 'z']
    etf  : ['dia', 'dust', 'edc', 'edz', 'eem', 'ewh', 'ewt', 'fas', 'faz',
            'gld', 'hyg', 'iwm', 'ivv', 'iwf', 'jnk', 'mdy', 'nugt', 'qqq',
            'sds', 'smh', 'spy', 'tbt', 'tlt', 'tna', 'tvix', 'tza', 'upro',
            'uvxy', 'vwo', 'vxx', 'xhb', 'xiv', 'xle', 'xlf', 'xlk', 'xlp',
            'xlu', 'xlv', 'xme']
    tech : ['aapl', 'adbe', 'amat', 'amgn', 'amzn', 'avgo', 'baba', 'bidu',
            'brcd', 'csco', 'ddd', 'emc', 'expe', 'fb', 'fit', 'fslr', 'goog',
            'intc', 'isrg', 'lnkd', 'msft', 'nflx', 'nvda', 'pcln', 'qcom',
            'qqq', 'tsla', 'twtr']
    test : ['aapl', 'amzn', 'goog', 'fb', 'nvda', 'tsla']

Variables and Aliases

Because market analysis encompasses a wide array of technical indicators, you can define features using the Variable Definition Language (VDL). The concept is simple: flatten out a function call and its parameters into a string, and that string represents the variable name. You can use the technical analysis functions in AlphaPy, or define your own.

Let’s define a feature that indicates whether or not a stock is above its 50-day closing moving average. The alphapy.market_variables module has a function ma to calculate a rolling mean. It has two parameters: the name of the dataframe’s column and the period over which to calculate the mean. So, the corresponding variable name is ma_close_50.

Typically, a moving average is calculated with the closing price, so we can define an alias cma which represents the closing moving average. An alias is simply a substitution mechanism for replacing one string with an abbreviation. Instead of ma_close_50, we can now refer to cma_50 using an alias.

Finally, we can define the variable abovema with a relational expression. Note that numeric values in the expression can be substituted when defining features, e.g., abovema_20.

market.yml
features: ['abovema_50']

aliases:
    cma        : 'ma_close'

variables:
    abovema    : 'close > cma_50'

Here are more examples of aliases.

market.yml
aliases:
    atr        : 'ma_truerange'
    aver       : 'ma_hlrange'
    cma        : 'ma_close'
    cmax       : 'highest_close'
    cmin       : 'lowest_close'
    hc         : 'higher_close'
    hh         : 'higher_high'
    hl         : 'higher_low'
    ho         : 'higher_open'
    hmax       : 'highest_high'
    hmin       : 'lowest_high'
    lc         : 'lower_close'
    lh         : 'lower_high'
    ll         : 'lower_low'
    lo         : 'lower_open'
    lmax       : 'highest_low'
    lmin       : 'lowest_low'
    net        : 'net_close'
    netdown    : 'down_net'
    netup      : 'up_net'
    omax       : 'highest_open'
    omin       : 'lowest_open'
    rmax       : 'highest_hlrange'
    rmin       : 'lowest_hlrange'
    rr         : 'maratio_hlrange'
    rixc       : 'rindex_close_high_low'
    rixo       : 'rindex_open_high_low'
    roi        : 'netreturn_close'
    rsi        : 'rsi_close'
    sepma      : 'ma_sep'
    vma        : 'ma_volume'
    vmratio    : 'maratio_volume'
    upmove     : 'net_high'

Variable expressions are valid Python expressions, with the addition of offsets to reference previous values.

market.yml
variables:
    abovema    : 'close > cma_50'
    belowma    : 'close < cma_50'
    bigup      : 'rrover & sephigh & netup'
    bigdown    : 'rrover & sephigh & netdown'
    doji       : 'sepdoji & rrunder'
    hookdown   : 'open > high[1] & close < close[1]'
    hookup     : 'open < low[1] & close > close[1]'
    inside     : 'low > low[1] & high < high[1]'
    madelta    : '(close - cma_50) / atr_10'
    nr         : 'hlrange == rmin_4'
    outside    : 'low < low[1] & high > high[1]'
    roihigh    : 'roi_5 >= 5'
    roilow     : 'roi_5 < -5'
    roiminus   : 'roi_5 < 0'
    roiplus    : 'roi_5 > 0'
    rrhigh     : 'rr_1_10 >= 1.2'
    rrlow      : 'rr_1_10 <= 0.8'
    rrover     : 'rr_1_10 >= 1.0'
    rrunder    : 'rr_1_10 < 1.0'
    sep        : 'rixc_1 - rixo_1'
    sepdoji    : 'abs(sep) <= 15'
    sephigh    : 'abs(sep_1_1) >= 70'
    seplow     : 'abs(sep_1_1) <= 30'
    trend      : 'rrover & sephigh'
    vmover     : 'vmratio >= 1'
    vmunder    : 'vmratio < 1'
    volatility : 'atr_10 / close'
    wr         : 'hlrange == rmax_4'

Once the aliases and variables are defined, a foundation is established for defining all of the features that you want to test.

market.yml
features: ['abovema_3', 'abovema_5', 'abovema_10', 'abovema_20', 'abovema_50',
           'adx', 'atr', 'bigdown', 'bigup', 'diminus', 'diplus', 'doji',
           'gap', 'gapbadown', 'gapbaup', 'gapdown', 'gapup',
           'hc', 'hh', 'ho', 'hl', 'lc', 'lh', 'll', 'lo', 'hookdown', 'hookup',
           'inside', 'outside', 'madelta_3', 'madelta_5', 'madelta_7', 'madelta_10',
           'madelta_12', 'madelta_15', 'madelta_18', 'madelta_20', 'madelta',
           'net', 'netdown', 'netup', 'nr_3', 'nr_4', 'nr_5', 'nr_7', 'nr_8',
           'nr_10', 'nr_18', 'roi', 'roi_2', 'roi_3', 'roi_4', 'roi_5', 'roi_10',
           'roi_20', 'rr_1_4', 'rr_1_7', 'rr_1_10', 'rr_2_5', 'rr_2_7', 'rr_2_10',
           'rr_3_8', 'rr_3_14', 'rr_4_10', 'rr_4_20', 'rr_5_10', 'rr_5_20',
           'rr_5_30', 'rr_6_14', 'rr_6_25', 'rr_7_14', 'rr_7_35', 'rr_8_22',
           'rrhigh', 'rrlow', 'rrover', 'rrunder', 'rsi_3', 'rsi_4', 'rsi_5',
           'rsi_6', 'rsi_8', 'rsi_10', 'rsi_14', 'sep_3_3', 'sep_5_5', 'sep_8_8',
           'sep_10_10', 'sep_14_14', 'sep_21_21', 'sep_30_30', 'sep_40_40',
           'sephigh', 'seplow', 'trend', 'vma', 'vmover', 'vmratio', 'vmunder',
           'volatility_3', 'volatility_5', 'volatility', 'volatility_20',
           'wr_2', 'wr_3', 'wr', 'wr_5', 'wr_6', 'wr_7', 'wr_10']

Trading Systems

Market Pipeline

MarketFlow provides two out-of-the-box trading systems. The first is a long/short system that you define using the system features in the configuration file market.yml. When MarketFlow detects a system in the file, it knows to execute that particular long/short strategy.

market.yml
market:
    data_history    : 1000
    forecast_period : 1
    fractal         : 1d
    leaders         : []
    predict_history : 50
    schema          : prices
    target_group    : faang

system:
    name       : 'closer'
    holdperiod : 0
    longentry  : hc
    longexit   :
    shortentry : lc
    shortexit  :
    scale      : False

groups:
    faang      : ['fb', 'aapl', 'amzn', 'nflx', 'googl']

features       : ['hc', 'lc']

aliases:
    hc         : 'higher_close'
    lc         : 'lower_close'
name:
Unique identifier for the trading system.
holdperiod:
Number of periods to hold an open position.
longentry:
A conditional feature to establish when to open a long position.
longexit:
A conditional feature to establish when to close a long position.
shortentry:
A conditional feature to establish when to open a short position.
shortexit:
A conditional feature to establish when to close a short position.
scale:
When True, add to a position in the same direction. The default action is not to scale positions.

The second system is an open range breakout strategy. The premise of the system is to wait for an established high-low range in the first n minutes (e.g., 30) and then wait for a breakout of either the high or the low, especially when the range is relatively narrow. Typically, a stop-loss is set at the other side of the breakout range.

After a system runs, four output files are stored in the system directory; the first three are formatted for analysis by Quantopian’s pyfolio package. The last file is the list of trades generated by MarketFlow based on the system specifications.

  • [group]_[system]_transactions_[fractal].csv
  • [group]_[system]_positions_[fractal].csv
  • [group]_[system]_returns_[fractal].csv
  • [group]_[system]_trades_[fractal].csv

If we developed a moving average crossover system on daily data for technology stocks, then the trades file could be named:

tech_xma_trades_1d.csv

The important point here is to reserve a namespace for different combinations of groups, systems, and fractals to compare performance over space and time.

Model Configuration

MarketFlow runs on top of AlphaPy, so the model.yml file has the same format. In the following example, note the use of treatments to calculate runs for a set of features.

model.yml
project:
    directory         : .
    file_extension    : csv
    submission_file   :
    submit_probas     : False

data:
    drop              : ['date', 'tag', 'open', 'high', 'low', 'close', 'volume', 'adjclose',
                         'low[1]', 'high[1]', 'net', 'close[1]', 'rmin_3', 'rmin_4', 'rmin_5',
                         'rmin_7', 'rmin_8', 'rmin_10', 'rmin_18', 'pval', 'mval', 'vma',
                         'rmax_2', 'rmax_3', 'rmax_4', 'rmax_5', 'rmax_6', 'rmax_7', 'rmax_10']
    features          : '*'
    sampling          :
        option        : True
        method        : under_random
        ratio         : 0.5
    sentinel          : -1
    separator         : ','
    shuffle           : True
    split             : 0.4
    target            : rrover
    target_value      : True

model:
    algorithms        : ['RF']
    balance_classes   : True
    calibration       :
        option        : False
        type          : isotonic
    cv_folds          : 3
    estimators        : 501
    feature_selection :
        option        : True
        percentage    : 50
        uni_grid      : [5, 10, 15, 20, 25]
        score_func    : f_classif
    grid_search       :
        option        : False
        iterations    : 100
        random        : True
        subsample     : True
        sampling_pct  : 0.25
    pvalue_level      : 0.01
    rfe               :
        option        : True
        step          : 10
    scoring_function  : 'roc_auc'
    type              : classification

features:
    clustering        :
        option        : False
        increment     : 3
        maximum       : 30
        minimum       : 3
    counts            :
        option        : False
    encoding          :
        rounding      : 3
        type          : factorize
    factors           : []
    interactions      :
        option        : True
        poly_degree   : 2
        sampling_pct  : 5
    isomap            :
        option        : False
        components    : 2
        neighbors     : 5
    logtransform      :
        option        : False
    numpy             :
        option        : False
    pca               :
        option        : False
        increment     : 3
        maximum       : 15
        minimum       : 3
        whiten        : False
    scaling           :
        option        : True
        type          : standard
    scipy             :
        option        : False
    text              :
        ngrams        : 1
        vectorize     : False
    tsne              :
        option        : False
        components    : 2
        learning_rate : 1000.0
        perplexity    : 30.0
    variance          :
        option        : True
        threshold     : 0.1

treatments:
    doji              : ['alphapy.features', 'runs_test', ['all'], 18]
    hc                : ['alphapy.features', 'runs_test', ['all'], 18]
    hh                : ['alphapy.features', 'runs_test', ['all'], 18]
    hl                : ['alphapy.features', 'runs_test', ['all'], 18]
    ho                : ['alphapy.features', 'runs_test', ['all'], 18]
    rrhigh            : ['alphapy.features', 'runs_test', ['all'], 18]
    rrlow             : ['alphapy.features', 'runs_test', ['all'], 18]
    rrover            : ['alphapy.features', 'runs_test', ['all'], 18]
    rrunder           : ['alphapy.features', 'runs_test', ['all'], 18]
    sephigh           : ['alphapy.features', 'runs_test', ['all'], 18]
    seplow            : ['alphapy.features', 'runs_test', ['all'], 18]
    trend             : ['alphapy.features', 'runs_test', ['all'], 18]

pipeline:
    number_jobs       : -1
    seed              : 10231
    verbosity         : 0

plots:
    calibration       : True
    confusion_matrix  : True
    importances       : True
    learning_curve    : True
    roc_curve         : True

xgboost:
    stopping_rounds   : 20

Creating the Model

First, change the directory to your project location, where you have already followed the Project Structure specifications:

cd path/to/project

Run this command to train a model:

mflow

Usage:

mflow [--train | --predict] [--tdate yyyy-mm-dd] [--pdate yyyy-mm-dd]
--train Train a new model and make predictions (Default)
--predict Make predictions from a saved model
--tdate The training date in format YYYY-MM-DD (Default: Earliest Date in the Data)
--pdate The prediction date in format YYYY-MM-DD (Default: Today’s Date)

Running the Model

In the project location, run mflow with the predict flag. MarketFlow will automatically create the predict.csv file using the pdate option:

mflow --predict [--pdate yyyy-mm-dd]