alphapy package

Submodules

alphapy.__main__ module

alphapy.__main__.main(args=None)

AlphaPy Main Program

Notes

  1. Initialize logging.
  2. Parse the command line arguments.
  3. Get the model configuration.
  4. Create the model object.
  5. Call the main AlphaPy pipeline.
alphapy.__main__.main_pipeline(model)

AlphaPy Main Pipeline

Parameters:model (alphapy.Model) – The model specifications for the pipeline.
Returns:model – The final model.
Return type:alphapy.Model
alphapy.__main__.prediction_pipeline(model)

AlphaPy Prediction Pipeline

Parameters:model (alphapy.Model) – The model object for controlling the pipeline.
Returns:None
Return type:None

Notes

The saved model is loaded from disk, and predictions are made on the new testing data.

alphapy.__main__.training_pipeline(model)

AlphaPy Training Pipeline

Parameters:model (alphapy.Model) – The model object for controlling the pipeline.
Returns:model – The final results are stored in the model object.
Return type:alphapy.Model
Raises:KeyError – If the number of columns of the train and test data do not match, then this exception is raised.

alphapy.alias module

class alphapy.alias.Alias(name, expr, replace=False)

Bases: object

Create a new alias as a key-value pair. All aliases are stored in Alias.aliases. Duplicate keys or values are not allowed, unless the replace parameter is True.

Parameters:
  • name (str) – Alias key.
  • expr (str) – Alias value.
  • replace (bool, optional) – Replace the current key-value pair if it already exists.
Variables:

Alias.aliases (dict) – Class variable for storing all known aliases

Examples

>>> Alias('atr', 'ma_truerange')
>>> Alias('hc', 'higher_close')
aliases = {}
alphapy.alias.get_alias(alias)

Find an alias value with the given key.

Parameters:alias (str) – Key for finding the alias value.
Returns:alias_value – Value for the corresponding key.
Return type:str

Examples

>>> alias_value = get_alias('atr')
>>> alias_value = get_alias('hc')

alphapy.analysis module

class alphapy.analysis.Analysis(model, group)

Bases: object

Create a new analysis for a group. All analyses are stored in Analysis.analyses. Duplicate keys are not allowed.

Parameters:
  • model (alphapy.Model) – Model object for the analysis.
  • group (alphapy.Group) – The group of members in the analysis.
Variables:

Analysis.analyses (dict) – Class variable for storing all known analyses

analyses = {}
alphapy.analysis.analysis_name(gname, target)

Get the name of the analysis.

Parameters:
  • gname (str) – Group name.
  • target (str) – Target of the analysis.
Returns:

name – Value for the corresponding key.

Return type:

str

alphapy.analysis.run_analysis(analysis, lag_period, forecast_period, leaders, predict_history, splits=True)

Run an analysis for a given model and group.

First, the data are loaded for each member of the analysis group. Then, the target value is lagged for the forecast_period, and any leaders are lagged as well. Each frame is split along the predict_date from the analysis, and finally the train and test files are generated.

Parameters:
  • analysis (alphapy.Analysis) – The analysis to run.
  • lag_period (int) – The number of lagged features for the analysis.
  • forecast_period (int) – The period for forecasting the target of the analysis.
  • leaders (list) – The features that are contemporaneous with the target.
  • predict_history (int) – The number of periods required for lookback calculations.
  • splits (bool, optional) – If True, then the data for each member of the analysis group are in separate files.
Returns:

analysis – The completed analysis.

Return type:

alphapy.Analysis

alphapy.data module

alphapy.data.get_data(model, partition)

Get data for the given partition.

Parameters:
  • model (alphapy.Model) – The model object describing the data.
  • partition (alphapy.Partition) – Reference to the dataset.
Returns:

  • X (pandas.DataFrame) – The feature set.
  • y (pandas.Series) – The array of target values, if available.

alphapy.data.get_feed_data(group, lookback_period)

Get data from an external feed.

Parameters:
  • group (alphapy.Group) – The group of symbols.
  • lookback_period (int) – The number of periods of data to retrieve.
Returns:

n_periods – The maximum number of periods actually retrieved.

Return type:

int

alphapy.data.get_google_data(symbol, lookback_period, fractal)

Get Google Finance intraday data.

We get intraday data from the Google Finance API, even though it is not officially supported. You can retrieve a maximum of 50 days of history, so you may want to build your own database for more extensive backtesting.

Parameters:
  • symbol (str) – A valid stock symbol.
  • lookback_period (int) – The number of days of intraday data to retrieve, capped at 50.
  • fractal (str) – The intraday frequency, e.g., “5m” for 5-minute data.
Returns:

df – The dataframe containing the intraday data.

Return type:

pandas.DataFrame

alphapy.data.get_pandas_data(schema, symbol, lookback_period)

Get Yahoo Finance daily data.

Parameters:
  • schema (str) – The source of the pandas-datareader data.
  • symbol (str) – A valid stock symbol.
  • lookback_period (int) – The number of days of daily data to retrieve.
Returns:

df – The dataframe containing the intraday data.

Return type:

pandas.DataFrame

alphapy.data.sample_data(model)

Sample the training data.

Sampling is configured in the model.yml file (data:sampling:method) You can learn more about resampling techniques here [IMB].

Parameters:model (alphapy.Model) – The model object describing the data.
Returns:model – The model object with the sampled data.
Return type:alphapy.Model
alphapy.data.shuffle_data(model)

Randomly shuffle the training data.

Parameters:model (alphapy.Model) – The model object describing the data.
Returns:model – The model object with the shuffled data.
Return type:alphapy.Model

alphapy.estimators module

class alphapy.estimators.AdaBoostClassifierCoef(base_estimator=None, n_estimators=50, learning_rate=1.0, algorithm='SAMME.R', random_state=None)

Bases: sklearn.ensemble.weight_boosting.AdaBoostClassifier

An AdaBoost classifier where the coefficients are set to the feature importances for Recursive Feature Elimination to work.

fit(*args, **kwargs)
class alphapy.estimators.Estimator(algorithm, model_type, estimator, grid, scoring=False)

Store information about each estimator.

Parameters:
  • algorithm (str) – Abbreviation representing the given algorithm.
  • model_type (enum ModelType) – The machine learning task for this algorithm.
  • estimator (function) – A scikit-learn, TensorFlow, or XGBoost function.
  • grid (dict) – The dictionary of hyperparameters for grid search.
  • scoring (bool, optional) – Use a scoring function to evaluate the best model.
class alphapy.estimators.ExtraTreesClassifierCoef(n_estimators=10, criterion='gini', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, bootstrap=False, oob_score=False, n_jobs=1, random_state=None, verbose=0, warm_start=False, class_weight=None)

Bases: sklearn.ensemble.forest.ExtraTreesClassifier

An Extra Trees classifier where the coefficients are set to the feature importances for Recursive Feature Elimination to work.

fit(*args, **kwargs)
class alphapy.estimators.GradientBoostingClassifierCoef(loss='deviance', learning_rate=0.1, n_estimators=100, subsample=1.0, criterion='friedman_mse', min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_depth=3, min_impurity_decrease=0.0, min_impurity_split=None, init=None, random_state=None, max_features=None, verbose=0, max_leaf_nodes=None, warm_start=False, presort='auto')

Bases: sklearn.ensemble.gradient_boosting.GradientBoostingClassifier

A Gradient Boostin classifier where the coefficients are set to the feature importances for Recursive Feature Elimination to work.

fit(*args, **kwargs)
class alphapy.estimators.RandomForestClassifierCoef(n_estimators=10, criterion='gini', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, bootstrap=True, oob_score=False, n_jobs=1, random_state=None, verbose=0, warm_start=False, class_weight=None)

Bases: sklearn.ensemble.forest.RandomForestClassifier

A Random Forest classifier where the coefficients are set to the feature importances for Recursive Feature Elimination to work.

fit(*args, **kwargs)
alphapy.estimators.get_algos_config(cfg_dir)

Read the algorithms configuration file.

Parameters:cfg_dir (str) – The directory where the configuration file algos.yml is stored.
Returns:specs – The specifications for determining which algorithms to run.
Return type:dict
alphapy.estimators.get_estimators(model)

Define all the AlphaPy estimators based on the contents of the algos.yml file.

Parameters:model (alphapy.Model) – The model object containing global AlphaPy parameters.
Returns:estimators – All of the estimators required for running the pipeline.
Return type:dict

alphapy.features module

alphapy.features.apply_treatment(fname, df, fparams)

Apply a treatment function to a column of the dataframe.

Parameters:
  • fname (str) – Name of the column to be treated in the dataframe df.
  • df (pandas.DataFrame) – Dataframe containing the column fname.
  • fparams (list) – The module, function, and parameter list of the treatment function
Returns:

new_features – The set of features after applying a treatment function.

Return type:

pandas.DataFrame

alphapy.features.apply_treatments(model, X)

Apply special functions to the original features.

Parameters:
  • model (alphapy.Model) – Model specifications indicating any treatments.
  • X (pandas.DataFrame) – Combined train and test data, or just prediction data.
Returns:

all_features – All features, including treatments.

Return type:

pandas.DataFrame

Raises:

IndexError – The number of treatment rows must match the number of rows in X.

alphapy.features.create_clusters(features, model)

Cluster the given features.

Parameters:
  • features (numpy array) – The features to cluster.
  • model (alphapy.Model) – The model object with the clustering parameters.
Returns:

cfeatures – The calculated clusters.

Return type:

numpy array

References

You can find more information on clustering here [CLUS].

[CLUS]http://scikit-learn.org/stable/modules/clustering.html
alphapy.features.create_crosstabs(model)

Create cross-tabulations for categorical variables.

Parameters:model (alphapy.Model) – The model object containing the data.
Returns:model – The model object with the updated feature map.
Return type:alphapy.Model
alphapy.features.create_features(model, X)

Create features for the train and test set.

Parameters:
  • model (alphapy.Model) – Model object with the feature specifications.
  • X (pandas.DataFrame) – Combined train and test data.
Returns:

all_features – The new features.

Return type:

numpy array

Raises:

TypeError – Unrecognized data type.

alphapy.features.create_interactions(model, X)

Create feature interactions based on the model specifications.

Parameters:
  • model (alphapy.Model) – Model object with train and test data.
  • X (numpy array) – Feature Matrix.
Returns:

all_features – The new interaction features.

Return type:

numpy array

Raises:

TypeError – Unknown model type when creating interactions.

alphapy.features.create_isomap_features(features, model)

Create Isomap features.

Parameters:
  • features (numpy array) – The input features.
  • model (alphapy.Model) – The model object with the Isomap parameters.
Returns:

ifeatures – The Isomap features.

Return type:

numpy array

Notes

Isomaps are very memory-intensive. Your process will be killed if you run out of memory.

References

You can find more information on Principal Component Analysis here [ISO].

[ISO]http://scikit-learn.org/stable/modules/manifold.html#isomap
alphapy.features.create_numpy_features(base_features, sentinel)

Calculate the sum, mean, standard deviation, and variance of each row.

Parameters:
  • base_features (numpy array) – The feature dataframe.
  • sentinel (float) – The number to be imputed for NaN values.
Returns:

np_features – The calculated NumPy features.

Return type:

numpy array

alphapy.features.create_pca_features(features, model)

Apply Principal Component Analysis (PCA) to the features.

Parameters:
  • features (numpy array) – The input features.
  • model (alphapy.Model) – The model object with the PCA parameters.
Returns:

pfeatures – The PCA features.

Return type:

numpy array

References

You can find more information on Principal Component Analysis here [PCA].

[PCA]http://scikit-learn.org/stable/modules/decomposition.html#pca
alphapy.features.create_scipy_features(base_features, sentinel)

Calculate the skew, kurtosis, and other statistical features for each row.

Parameters:
  • base_features (numpy array) – The feature dataframe.
  • sentinel (float) – The number to be imputed for NaN values.
Returns:

sp_features – The calculated SciPy features.

Return type:

numpy array

alphapy.features.create_tsne_features(features, model)

Create t-SNE features.

Parameters:
  • features (numpy array) – The input features.
  • model (alphapy.Model) – The model object with the t-SNE parameters.
Returns:

tfeatures – The t-SNE features.

Return type:

numpy array

References

You can find more information on the t-SNE technique here [TSNE].

[TSNE]http://scikit-learn.org/stable/modules/manifold.html#t-distributed-stochastic-neighbor-embedding-t-sne
alphapy.features.cvectorize(f, c, n)

Use the Count Vectorizer and TF-IDF Transformer.

Parameters:
  • f (pandas.DataFrame) – Dataframe containing the column c.
  • c (str) – Name of the text column in the dataframe f.
  • n (int) – The number of n-grams.
Returns:

new_features – The transformed features.

Return type:

sparse matrix

References

To use count vectorization and TF-IDF, you can find more information here [TFE].

[TFE](1, 2) http://scikit-learn.org/stable/modules/feature_extraction.html#text-feature-extraction
alphapy.features.drop_features(X, drop)

Drop any specified features.

Parameters:
  • X (pandas.DataFrame) – The dataframe containing the features.
  • drop (list) – The list of features to remove from X.
Returns:

X – The dataframe without the dropped features.

Return type:

pandas.DataFrame

alphapy.features.float_factor(x, rounding)

Convert a floating point number to a factor.

Parameters:
  • x (float) – The value to convert to a factor.
  • rounding (int) – The number of places to round.
Returns:

ffactor – The resulting factor.

Return type:

int

alphapy.features.get_factors(model, df, fnum, fname, nvalues, dtype, encoder, rounding, sentinel)

Convert the original feature to a factor.

Parameters:
  • model (alphapy.Model) – Model object with the feature specifications.
  • df (pandas.DataFrame) – Dataframe containing the column fname.
  • fnum (int) – Feature number, strictly for logging purposes
  • fname (str) – Name of the text column in the dataframe df.
  • nvalues (int) – The number of unique values.
  • dtype (str) – The values 'float64', 'int64', or 'bool'.
  • encoder (alphapy.features.Encoders) – Type of encoder to apply.
  • rounding (int) – Number of places to round.
  • sentinel (float) – The number to be imputed for NaN values.
Returns:

all_features – The features that have been transformed to factors.

Return type:

numpy array

alphapy.features.get_numerical_features(fnum, fname, df, nvalues, dt, sentinel, logt, plevel)

Transform numerical features with imputation and possibly log-transformation.

Parameters:
  • fnum (int) – Feature number, strictly for logging purposes
  • fname (str) – Name of the numerical column in the dataframe df.
  • df (pandas.DataFrame) – Dataframe containing the column fname.
  • nvalues (int) – The number of unique values.
  • dt (str) – The values 'float64', 'int64', or 'bool'.
  • sentinel (float) – The number to be imputed for NaN values.
  • logt (bool) – If True, then log-transform numerical values.
  • plevel (float) – The p-value threshold to test if a feature is normally distributed.
Returns:

new_values – The set of imputed and transformed features.

Return type:

numpy array

alphapy.features.get_polynomials(features, poly_degree)

Generate interactions that are products of distinct features.

Parameters:
  • features (pandas.DataFrame) – Dataframe containing the features for generating interactions.
  • poly_degree (int) – The degree of the polynomial features.
Returns:

poly_features – The interaction features only.

Return type:

numpy array

References

You can find more information on polynomial interactions here [POLY].

[POLY]http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html
alphapy.features.get_text_features(fnum, fname, df, nvalues, vectorize, ngrams_max)

Transform text features with count vectorization and TF-IDF, or alternatively factorization.

Parameters:
  • fnum (int) – Feature number, strictly for logging purposes
  • fname (str) – Name of the text column in the dataframe df.
  • df (pandas.DataFrame) – Dataframe containing the column fname.
  • nvalues (int) – The number of unique values.
  • vectorize (bool) – If True, then attempt count vectorization.
  • ngrams_max (int) – The maximum number of n-grams for count vectorization.
Returns:

new_features – The vectorized or factorized text features.

Return type:

numpy array

References

To use count vectorization and TF-IDF, you can find more information here [TFE].

alphapy.features.impute_values(features, dt, sentinel)

Impute values for a given data type. The median strategy is applied for floating point values, and the most frequent strategy is applied for integer or Boolean values.

Parameters:
  • features (pandas.DataFrame) – Dataframe containing the features for imputation.
  • dt (str) – The values 'float64', 'int64', or 'bool'.
  • sentinel (float) – The number to be imputed for NaN values.
Returns:

imputed_features – The features after imputation.

Return type:

numpy array

Raises:

TypeError – Data type dt is invalid for imputation.

References

You can find more information on feature imputation here [IMP].

[IMP]http://scikit-learn.org/stable/modules/preprocessing.html#imputation
alphapy.features.remove_lv_features(model, X)

Remove low-variance features.

Parameters:
  • model (alphapy.Model) – Model specifications for removing features.
  • X (numpy array) – The feature matrix.
Returns:

X_reduced – The reduced feature matrix.

Return type:

numpy array

References

You can find more information on low-variance feature selection here [LV].

[LV]http://scikit-learn.org/stable/modules/feature_selection.html#variance-threshold
alphapy.features.rtotal(vec)

Calculate the running total.

Parameters:vec (pandas.Series) – The input array for calculating the running total.
Returns:running_total – The final running total.
Return type:int

Example

>>> vec.rolling(window=20).apply(rtotal)
alphapy.features.runs(vec)

Calculate the total number of runs.

Parameters:vec (pandas.Series) – The input array for calculating the number of runs.
Returns:runs_value – The total number of runs.
Return type:int

Example

>>> vec.rolling(window=20).apply(runs)
alphapy.features.runs_test(f, c, wfuncs, window)

Perform a runs test on binary series.

Parameters:
  • f (pandas.DataFrame) – Dataframe containing the column c.

  • c (str) – Name of the column in the dataframe f.

  • wfuncs (list) – The set of runs test functions to apply to the column:

    'all':

    Run all of the functions below.

    'rtotal':

    The running total over the window period.

    'runs':

    Total number of runs in window.

    'streak':

    The length of the latest streak.

    'zscore':

    The Z-Score over the window period.

  • window (int) – The rolling period.

Returns:

new_features – The dataframe containing the runs test features.

Return type:

pandas.DataFrame

References

For more information about runs tests for detecting non-randomness, refer to [RUNS].

[RUNS]http://www.itl.nist.gov/div898/handbook/eda/section3/eda35d.htm
alphapy.features.save_features(model, X_train, X_test, y_train=None, y_test=None)

Save new features to the model.

Parameters:
  • model (alphapy.Model) – Model object with train and test data.
  • X_train (numpy array) – Training features.
  • X_test (numpy array) – Testing features.
  • y_train (numpy array) – Training labels.
  • y_test (numpy array) – Testing labels.
Returns:

model – Model object with new train and test data.

Return type:

alphapy.Model

alphapy.features.select_features(model)

Select features with univariate selection.

Parameters:model (alphapy.Model) – Model object with the feature selection specifications.
Returns:model – Model object with the revised number of features.
Return type:alphapy.Model

References

You can find more information on univariate feature selection here [UNI].

[UNI]http://scikit-learn.org/stable/modules/feature_selection.html#univariate-feature-selection
alphapy.features.split_to_letters(f, c)

Separate text into distinct characters.

Parameters:
  • f (pandas.DataFrame) – Dataframe containing the column c.
  • c (str) – Name of the text column in the dataframe f.
Returns:

new_feature – The array containing the new feature.

Return type:

pandas.Series

Example

The value ‘abc’ becomes ‘a b c’.

alphapy.features.streak(vec)

Determine the length of the latest streak.

Parameters:vec (pandas.Series) – The input array for calculating the latest streak.
Returns:latest_streak – The length of the latest streak.
Return type:int

Example

>>> vec.rolling(window=20).apply(streak)
alphapy.features.texplode(f, c)

Get dummy values for a text column.

Parameters:
  • f (pandas.DataFrame) – Dataframe containing the column c.
  • c (str) – Name of the text column in the dataframe f.
Returns:

dummies – The dataframe containing the dummy variables.

Return type:

pandas.DataFrame

Example

This function is useful for columns that appear to have separate character codes but are consolidated into a single column. Here, the column c is transformed into five dummy variables.

c 0_a 1_x 1_b 2_x 2_z
abz 1 0 1 0 1
abz 1 0 1 0 1
axx 1 1 0 1 0
abz 1 0 1 0 1
axz 1 1 0 0 1
alphapy.features.zscore(vec)

Calculate the Z-Score.

Parameters:vec (pandas.Series) – The input array for calculating the Z-Score.
Returns:zscore – The value of the Z-Score.
Return type:float

References

To calculate the Z-Score, you can find more information here [ZSCORE].

[ZSCORE]https://en.wikipedia.org/wiki/Standard_score

Example

>>> vec.rolling(window=20).apply(zscore)

alphapy.frame module

class alphapy.frame.Frame(name, space, df)

Bases: object

Create a new Frame that points to a dataframe in memory. All frames are stored in Frame.frames. Names must be unique.

Parameters:
  • name (str) – Frame key.
  • space (alphapy.Space) – Namespace of the given frame.
  • df (pandas.DataFrame) – The contents of the actual dataframe.
Variables:

frames (dict) – Class variable for storing all known frames

Examples

>>> Frame('tech', Space('stock', 'prices', '5m'), df)
frames = {}
alphapy.frame.dump_frames(group, directory, extension, separator)

Save a group of data frames to disk.

Parameters:
  • group (alphapy.Group) – The collection of frames to be saved to the file system.
  • directory (str) – Full directory specification.
  • extension (str) – File name extension, e.g., csv.
  • separator (str) – The delimiter between fields in the file.
Returns:

None

Return type:

None

alphapy.frame.frame_name(name, space)

Get the frame name for the given name and space.

Parameters:
  • name (str) – Group name.
  • space (alphapy.Space) – Context or namespace for the given group name.
Returns:

fname – Frame name.

Return type:

str

Examples

>>> fname = frame_name('tech', Space('stock', 'prices', '1d'))
# 'tech_stock_prices_1d'
alphapy.frame.load_frames(group, directory, extension, separator, splits=False)

Read a group of dataframes into memory.

Parameters:
  • group (alphapy.Group) – The collection of frames to be read into memory.
  • directory (str) – Full directory specification.
  • extension (str) – File name extension, e.g., csv.
  • separator (str) – The delimiter between fields in the file.
  • splits (bool, optional) – If True, then all the members of the group are stored in one file. If False, then the data are stored in separate files corresponding with each member.
Returns:

all_frames – The list of pandas dataframes loaded from the file location. If the files cannot be located, then None is returned.

Return type:

list

alphapy.frame.read_frame(directory, filename, extension, separator, index_col=None, squeeze=False)

Read a delimiter-separated file into a data frame.

Parameters:
  • directory (str) – Full directory specification.
  • filename (str) – Name of the file to read, excluding the extension.
  • extension (str) – File name extension, e.g., csv.
  • separator (str) – The delimiter between fields in the file.
  • index_col (str, optional) – Column to use as the row labels in the dataframe.
  • squeeze (bool, optional) – If the data contains only one column, then return a pandas Series.
Returns:

df – The pandas dataframe loaded from the file location. If the file cannot be located, then None is returned.

Return type:

pandas.DataFrame

alphapy.frame.sequence_frame(df, target, leaders, lag_period=1, forecast_period=1, exclude_cols=[])

Run an analysis for a given model and group.

Parameters:
  • df (pandas.DataFrame) – The original dataframe.
  • target (str) – The target variable for prediction.
  • leaders (list) – The features that are contemporaneous with the target.
  • lag_period (int) – The number of lagged rows for prediction.
  • forecast_period (int) – The period for forecasting the target of the analysis.
Returns:

new_frame – The transformed dataframe with variable sequences.

Return type:

pandas.DataFrame

alphapy.frame.write_frame(df, directory, filename, extension, separator, index=False, index_label=None)

Write a dataframe into a delimiter-separated file.

Parameters:
  • df (pandas.DataFrame) – The pandas dataframe to save to a file.
  • directory (str) – Full directory specification.
  • filename (str) – Name of the file to write, excluding the extension.
  • extension (str) – File name extension, e.g., csv.
  • separator (str) – The delimiter between fields in the file.
  • index (bool, optional) – If True, write the row names (index).
  • index_label (str, optional) – A column label for the index.
Returns:

None

Return type:

None

alphapy.globals module

class alphapy.globals.Encoders

Bases: enum.Enum

AlphaPy Encoders.

These are the encoders used in AlphaPy, as configured in the model.yml file (features:encoding:type) You can learn more about encoders here [ENC].

[ENC]https://github.com/scikit-learn-contrib/categorical-encoding
backdiff = 1
binary = 2
factorize = 3
helmert = 4
onehot = 5
ordinal = 6
polynomial = 7
sumcont = 8
class alphapy.globals.ModelType

Bases: enum.Enum

AlphaPy Model Types.

Note

One-Class Classification oneclass is not yet implemented.

classification = 1
clustering = 2
multiclass = 3
oneclass = 4
regression = 5
class alphapy.globals.Objective

Bases: enum.Enum

Scoring Function Objectives.

Best model selection is based on the scoring or Objective function, which must be either maximized or minimized. For example, roc_auc is maximized, while neg_log_loss is minimized.

maximize = 1
minimize = 2
class alphapy.globals.Orders

System Order Types.

Variables:
  • le (str) – long entry
  • se (str) – short entry
  • lx (str) – long exit
  • sx (str) – short exit
  • lh (str) – long exit at the end of the holding period
  • sh (str) – short exit at the end of the holding period
le = 'le'
lh = 'lh'
lx = 'lx'
se = 'se'
sh = 'sh'
sx = 'sx'
class alphapy.globals.Partition

Bases: enum.Enum

AlphaPy Partitions.

predict = 1
test = 2
train = 3
class alphapy.globals.SamplingMethod

Bases: enum.Enum

AlphaPy Sampling Methods.

These are the data sampling methods used in AlphaPy, as configured in the model.yml file (data:sampling:method) You can learn more about resampling techniques here [IMB].

[IMB](1, 2) https://github.com/scikit-learn-contrib/imbalanced-learn
ensemble_bc = 1
ensemble_easy = 2
over_random = 3
over_smote = 4
over_smoteb = 5
over_smotesv = 6
overunder_smote_enn = 7
overunder_smote_tomek = 8
under_cluster = 9
under_ncr = 10
under_nearmiss = 11
under_random = 12
under_tomek = 13
class alphapy.globals.Scalers

Bases: enum.Enum

AlphaPy Scalers.

These are the scaling methods used in AlphaPy, as configured in the model.yml file (features:scaling:type) You can learn more about feature scaling here [SCALE].

[SCALE]http://scikit-learn.org/stable/modules/preprocessing.html
minmax = 1
standard = 2

alphapy.group module

class alphapy.group.Group(name, space=<alphapy.space.Space instance>, dynamic=True, members=set([]))

Bases: object

Create a new Group that contains common members. All defined groups are stored in Group.groups. Group names must be unique.

Parameters:
  • name (str) – Group name.
  • space (alphapy.Space, optional) – Namespace for the given group.
  • dynamic (bool, optional, default True) – Flag for defining whether or not the group membership can change.
  • members (set, optional) – The initial members of the group, especially if the new group is fixed, e.g., not dynamic.
Variables:

groups (dict) – Class variable for storing all known groups

Examples

>>> Group('tech')
add(newlist)

Add new members to the group.

Parameters:newlist (list) – New members or identifiers to add to the group.
Returns:None
Return type:None

Notes

New members cannot be added to a fixed or non-dynamic group.

groups = {}
member(item)

Find a member in the group.

Parameters:item (str) – The member to find the group.
Returns:member_exists – Flag indicating whether or not the member is in the group.
Return type:bool
remove(remlist)

Read in data from the given directory in a given format.

Parameters:remlist (list) – The list of members to remove from the group.
Returns:None
Return type:None

Notes

Members cannot be removed from a fixed or non-dynamic group.

alphapy.market_flow module

alphapy.market_flow.get_market_config()

Read the configuration file for MarketFlow.

Parameters:None (None)
Returns:specs – The parameters for controlling MarketFlow.
Return type:dict
alphapy.market_flow.main(args=None)

MarketFlow Main Program

Notes

  1. Initialize logging.
  2. Parse the command line arguments.
  3. Get the market configuration.
  4. Get the model configuration.
  5. Create the model object.
  6. Call the main MarketFlow pipeline.
Raises:ValueError – Training date must be before prediction date.
alphapy.market_flow.market_pipeline(model, market_specs)

AlphaPy MarketFlow Pipeline

Parameters:
  • model (alphapy.Model) – The model object for AlphaPy.
  • market_specs (dict) – The specifications for controlling the MarketFlow pipeline.
Returns:

model – The final results are stored in the model object.

Return type:

alphapy.Model

Notes

  1. Define a group.
  2. Get the market data.
  3. Apply system features.
  4. Create an analysis.
  5. Run the analysis, which calls AlphaPy.

alphapy.market_variables module

class alphapy.market_variables.Variable(name, expr, replace=False)

Bases: object

Create a new variable as a key-value pair. All variables are stored in Variable.variables. Duplicate keys or values are not allowed, unless the replace parameter is True.

Parameters:
  • name (str) – Variable key.
  • expr (str) – Variable value.
  • replace (bool, optional) – Replace the current key-value pair if it already exists.
Variables:

variables (dict) – Class variable for storing all known variables

Examples

>>> Variable('rrunder', 'rr_3_20 <= 0.9')
>>> Variable('hc', 'higher_close')
variables = {}
alphapy.market_variables.abovema(f, c, p=50)

Determine those values of the dataframe that are above the moving average.

Parameters:
  • f (pandas.DataFrame) – Dataframe containing the column c.
  • c (str) – Name of the column in the dataframe f.
  • p (int) – The period of the moving average.
Returns:

new_column – The array containing the new feature.

Return type:

pandas.Series (bool)

alphapy.market_variables.adx(f, p=14)

Calculate the Average Directional Index (ADX).

Parameters:
  • f (pandas.DataFrame) – Dataframe with all columns required for calculation. If you are applying ADX through vapply, then these columns are calculated automatically.
  • p (int) – The period over which to calculate the ADX.
Returns:

new_column – The array containing the new feature.

Return type:

pandas.Series (float)

References

The Average Directional Movement Index (ADX) was invented by J. Welles Wilder in 1978 [WIKI_ADX]. Its value reflects the strength of trend in any given instrument.

[WIKI_ADX]https://en.wikipedia.org/wiki/Average_directional_movement_index
alphapy.market_variables.allvars(expr)

Get the list of valid names in the expression.

Parameters:expr (str) – A valid expression conforming to the Variable Definition Language.
Returns:vlist – List of valid variable names.
Return type:list
alphapy.market_variables.belowma(f, c, p=50)

Determine those values of the dataframe that are below the moving average.

Parameters:
  • f (pandas.DataFrame) – Dataframe containing the column c.
  • c (str) – Name of the column in the dataframe f.
  • p (int) – The period of the moving average.
Returns:

new_column – The array containing the new feature.

Return type:

pandas.Series (bool)

alphapy.market_variables.c2max(f, c1, c2)

Take the maximum value between two columns in a dataframe.

Parameters:
  • f (pandas.DataFrame) – Dataframe containing the two columns c1 and c2.
  • c1 (str) – Name of the first column in the dataframe f.
  • c2 (str) – Name of the second column in the dataframe f.
Returns:

max_val – The maximum value of the two columns.

Return type:

float

alphapy.market_variables.c2min(f, c1, c2)

Take the minimum value between two columns in a dataframe.

Parameters:
  • f (pandas.DataFrame) – Dataframe containing the two columns c1 and c2.
  • c1 (str) – Name of the first column in the dataframe f.
  • c2 (str) – Name of the second column in the dataframe f.
Returns:

min_val – The minimum value of the two columns.

Return type:

float

alphapy.market_variables.diff(f, c, n=1)

Calculate the n-th order difference for the given variable.

Parameters:
  • f (pandas.DataFrame) – Dataframe containing the column c.
  • c (str) – Name of the column in the dataframe f.
  • n (int) – The number of times that the values are differenced.
Returns:

new_column – The array containing the new feature.

Return type:

pandas.Series (float)

alphapy.market_variables.diminus(f, p=14)

Calculate the Minus Directional Indicator (-DI).

Parameters:
  • f (pandas.DataFrame) – Dataframe with columns high and low.
  • p (int) – The period over which to calculate the -DI.
Returns:

new_column – The array containing the new feature.

Return type:

pandas.Series (float)

References

A component of the average directional index (ADX) that is used to measure the presence of a downtrend. When the -DI is sloping downward, it is a signal that the downtrend is getting stronger [IP_NDI].

[IP_NDI]http://www.investopedia.com/terms/n/negativedirectionalindicator.asp
alphapy.market_variables.diplus(f, p=14)

Calculate the Plus Directional Indicator (+DI).

Parameters:
  • f (pandas.DataFrame) – Dataframe with columns high and low.
  • p (int) – The period over which to calculate the +DI.
Returns:

new_column – The array containing the new feature.

Return type:

pandas.Series (float)

References

A component of the average directional index (ADX) that is used to measure the presence of an uptrend. When the +DI is sloping upward, it is a signal that the uptrend is getting stronger [IP_PDI].

[IP_PDI]http://www.investopedia.com/terms/p/positivedirectionalindicator.asp
alphapy.market_variables.dminus(f)

Calculate the Minus Directional Movement (-DM).

Parameters:f (pandas.DataFrame) – Dataframe with columns high and low.
Returns:new_column – The array containing the new feature.
Return type:pandas.Series (float)

References

Directional movement is negative (minus) when the prior low minus the current low is greater than the current high minus the prior high. This so-called Minus Directional Movement (-DM) equals the prior low minus the current low, provided it is positive. A negative value would simply be entered as zero [SC_ADX].

alphapy.market_variables.dmplus(f)

Calculate the Plus Directional Movement (+DM).

Parameters:f (pandas.DataFrame) – Dataframe with columns high and low.
Returns:new_column – The array containing the new feature.
Return type:pandas.Series (float)

References

Directional movement is positive (plus) when the current high minus the prior high is greater than the prior low minus the current low. This so-called Plus Directional Movement (+DM) then equals the current high minus the prior high, provided it is positive. A negative value would simply be entered as zero [SC_ADX].

[SC_ADX](1, 2) http://stockcharts.com/school/doku.php?id=chart_school:technical_indicators:average_directional_index_adx
alphapy.market_variables.down(f, c)

Find the negative values in the series.

Parameters:
  • f (pandas.DataFrame) – Dataframe containing the column c.
  • c (str) – Name of the column in the dataframe f.
Returns:

new_column – The array containing the new feature.

Return type:

pandas.Series (bool)

alphapy.market_variables.dpc(f, c)

Get the negative values, with positive values zeroed.

Parameters:
  • f (pandas.DataFrame) – Dataframe with column c.
  • c (str) – Name of the column.
Returns:

new_column – The array containing the new feature.

Return type:

pandas.Series (float)

alphapy.market_variables.ema(f, c, p=20)

Calculate the mean on a rolling basis.

Parameters:
  • f (pandas.DataFrame) – Dataframe containing the column c.
  • c (str) – Name of the column in the dataframe f.
  • p (int) – The period over which to calculate the rolling mean.
Returns:

new_column – The array containing the new feature.

Return type:

pandas.Series (float)

References

An exponential moving average (EMA) is a type of moving average that is similar to a simple moving average, except that more weight is given to the latest data [IP_EMA].

[IP_EMA]http://www.investopedia.com/terms/e/ema.asp
alphapy.market_variables.gap(f)

Calculate the gap percentage between the current open and the previous close.

Parameters:f (pandas.DataFrame) – Dataframe with columns open and close.
Returns:new_column – The array containing the new feature.
Return type:pandas.Series (float)

References

A gap is a break between prices on a chart that occurs when the price of a stock makes a sharp move up or down with no trading occurring in between [IP_GAP].

[IP_GAP](1, 2, 3) http://www.investopedia.com/terms/g/gap.asp
alphapy.market_variables.gapbadown(f)

Determine whether or not there has been a breakaway gap down.

Parameters:f (pandas.DataFrame) – Dataframe with columns open and low.
Returns:new_column – The array containing the new feature.
Return type:pandas.Series (bool)

References

A breakaway gap represents a gap in the movement of a stock price supported by levels of high volume [IP_BAGAP].

[IP_BAGAP](1, 2) http://www.investopedia.com/terms/b/breakawaygap.asp
alphapy.market_variables.gapbaup(f)

Determine whether or not there has been a breakaway gap up.

Parameters:f (pandas.DataFrame) – Dataframe with columns open and high.
Returns:new_column – The array containing the new feature.
Return type:pandas.Series (bool)

References

A breakaway gap represents a gap in the movement of a stock price supported by levels of high volume [IP_BAGAP].

alphapy.market_variables.gapdown(f)

Determine whether or not there has been a gap down.

Parameters:f (pandas.DataFrame) – Dataframe with columns open and close.
Returns:new_column – The array containing the new feature.
Return type:pandas.Series (bool)

References

A gap is a break between prices on a chart that occurs when the price of a stock makes a sharp move up or down with no trading occurring in between [IP_GAP].

alphapy.market_variables.gapup(f)

Determine whether or not there has been a gap up.

Parameters:f (pandas.DataFrame) – Dataframe with columns open and close.
Returns:new_column – The array containing the new feature.
Return type:pandas.Series (bool)

References

A gap is a break between prices on a chart that occurs when the price of a stock makes a sharp move up or down with no trading occurring in between [IP_GAP].

alphapy.market_variables.gtval(f, c1, c2)

Determine whether or not the first column of a dataframe is greater than the second.

Parameters:
  • f (pandas.DataFrame) – Dataframe containing the two columns c1 and c2.
  • c1 (str) – Name of the first column in the dataframe f.
  • c2 (str) – Name of the second column in the dataframe f.
Returns:

new_column – The array containing the new feature.

Return type:

pandas.Series (bool)

alphapy.market_variables.gtval0(f, c1, c2)

For positive values in the first column of the dataframe that are greater than the second column, get the value in the first column, otherwise return zero.

Parameters:
  • f (pandas.DataFrame) – Dataframe containing the two columns c1 and c2.
  • c1 (str) – Name of the first column in the dataframe f.
  • c2 (str) – Name of the second column in the dataframe f.
Returns:

new_val – A positive value or zero.

Return type:

float

alphapy.market_variables.higher(f, c, o=1)

Determine whether or not a series value is higher than the value o periods back.

Parameters:
  • f (pandas.DataFrame) – Dataframe containing the column c.
  • c (str) – Name of the column in the dataframe f.
  • o (int, optional) – Offset value for shifting the series.
Returns:

new_column – The array containing the new feature.

Return type:

pandas.Series (bool)

alphapy.market_variables.highest(f, c, p=20)

Calculate the highest value on a rolling basis.

Parameters:
  • f (pandas.DataFrame) – Dataframe containing the column c.
  • c (str) – Name of the column in the dataframe f.
  • p (int) – The period over which to calculate the rolling maximum.
Returns:

new_column – The array containing the new feature.

Return type:

pandas.Series (float)

alphapy.market_variables.hlrange(f, p=1)

Calculate the Range, the difference between High and Low.

Parameters:
  • f (pandas.DataFrame) – Dataframe with columns high and low.
  • p (int) – The period over which the range is calculated.
Returns:

new_column – The array containing the new feature.

Return type:

pandas.Series (float)

alphapy.market_variables.lower(f, c, o=1)

Determine whether or not a series value is lower than the value o periods back.

Parameters:
  • f (pandas.DataFrame) – Dataframe containing the column c.
  • c (str) – Name of the column in the dataframe f.
  • o (int, optional) – Offset value for shifting the series.
Returns:

new_column – The array containing the new feature.

Return type:

pandas.Series (bool)

alphapy.market_variables.lowest(f, c, p=20)

Calculate the lowest value on a rolling basis.

Parameters:
  • f (pandas.DataFrame) – Dataframe containing the column c.
  • c (str) – Name of the column in the dataframe f.
  • p (int) – The period over which to calculate the rolling minimum.
Returns:

new_column – The array containing the new feature.

Return type:

pandas.Series (float)

alphapy.market_variables.ma(f, c, p=20)

Calculate the mean on a rolling basis.

Parameters:
  • f (pandas.DataFrame) – Dataframe containing the column c.
  • c (str) – Name of the column in the dataframe f.
  • p (int) – The period over which to calculate the rolling mean.
Returns:

new_column – The array containing the new feature.

Return type:

pandas.Series (float)

References

In statistics, a moving average (rolling average or running average) is a calculation to analyze data points by creating series of averages of different subsets of the full data set [WIKI_MA].

[WIKI_MA]https://en.wikipedia.org/wiki/Moving_average
alphapy.market_variables.maratio(f, c, p1=1, p2=10)

Calculate the ratio of two moving averages.

Parameters:
  • f (pandas.DataFrame) – Dataframe containing the column c.
  • c (str) – Name of the column in the dataframe f.
  • p1 (int) – The period of the first moving average.
  • p2 (int) – The period of the second moving average.
Returns:

new_column – The array containing the new feature.

Return type:

pandas.Series (float)

alphapy.market_variables.mval(f, c)

Get the negative value, otherwise zero.

Parameters:
  • f (pandas.DataFrame) – Dataframe containing the column c.
  • c (str) – Name of the column in the dataframe f.
Returns:

new_val – Negative value or zero.

Return type:

float

alphapy.market_variables.net(f, c='close', o=1)

Calculate the net change of a given column.

Parameters:
  • f (pandas.DataFrame) – Dataframe containing the column c.
  • c (str) – Name of the column in the dataframe f.
  • o (int, optional) – Offset value for shifting the series.
Returns:

new_column – The array containing the new feature.

Return type:

pandas.Series (float)

References

Net change is the difference between the closing price of a security on the day’s trading and the previous day’s closing price. Net change can be positive or negative and is quoted in terms of dollars [IP_NET].

[IP_NET]http://www.investopedia.com/terms/n/netchange.asp
alphapy.market_variables.netreturn(f, c, o=1)

Calculate the net return, or Return On Invesment (ROI)

Parameters:
  • f (pandas.DataFrame) – Dataframe containing the column c.
  • c (str) – Name of the column in the dataframe f.
  • o (int, optional) – Offset value for shifting the series.
Returns:

new_column – The array containing the new feature.

Return type:

pandas.Series (float)

References

ROI measures the amount of return on an investment relative to the original cost. To calculate ROI, the benefit (or return) of an investment is divided by the cost of the investment, and the result is expressed as a percentage or a ratio [IP_ROI].

[IP_ROI]http://www.investopedia.com/terms/r/returnoninvestment.asp
alphapy.market_variables.pchange1(f, c, o=1)

Calculate the percentage change within the same variable.

Parameters:
  • f (pandas.DataFrame) – Dataframe containing the column c.
  • c (str) – Name of the column in the dataframe f.
  • o (int) – Offset to the previous value.
Returns:

new_column – The array containing the new feature.

Return type:

pandas.Series (float)

alphapy.market_variables.pchange2(f, c1, c2)

Calculate the percentage change between two variables.

Parameters:
  • f (pandas.DataFrame) – Dataframe containing the two columns c1 and c2.
  • c1 (str) – Name of the first column in the dataframe f.
  • c2 (str) – Name of the second column in the dataframe f.
Returns:

new_column – The array containing the new feature.

Return type:

pandas.Series (float)

alphapy.market_variables.pval(f, c)

Get the positive value, otherwise zero.

Parameters:
  • f (pandas.DataFrame) – Dataframe containing the column c.
  • c (str) – Name of the column in the dataframe f.
Returns:

new_val – Positive value or zero.

Return type:

float

alphapy.market_variables.rindex(f, ci, ch, cl, p=1)

Calculate the range index spanning a given period p.

The range index is a number between 0 and 100 that relates the value of the index column ci to the high column ch and the low column cl. For example, if the low value of the range is 10 and the high value is 20, then the range index for a value of 15 would be 50%. The range index for 18 would be 80%.

Parameters:
  • f (pandas.DataFrame) – Dataframe containing the columns ci, ch, and cl.
  • ci (str) – Name of the index column in the dataframe f.
  • ch (str) – Name of the high column in the dataframe f.
  • cl (str) – Name of the low column in the dataframe f.
  • p (int) – The period over which the range index of column ci is calculated.
Returns:

new_column – The array containing the new feature.

Return type:

pandas.Series (float)

alphapy.market_variables.rsi(f, c, p=14)

Calculate the Relative Strength Index (RSI).

Parameters:
  • f (pandas.DataFrame) – Dataframe containing the column net.
  • c (str) – Name of the column in the dataframe f.
  • p (int) – The period over which to calculate the RSI.
Returns:

new_column – The array containing the new feature.

Return type:

pandas.Series (float)

References

Developed by J. Welles Wilder, the Relative Strength Index (RSI) is a momentum oscillator that measures the speed and change of price movements [SC_RSI].

[SC_RSI]http://stockcharts.com/school/doku.php?id=chart_school:technical_indicators:relative_strength_index_rsi
alphapy.market_variables.truehigh(f)

Calculate the True High value.

Parameters:f (pandas.DataFrame) – Dataframe with columns high and low.
Returns:new_column – The array containing the new feature.
Return type:pandas.Series (float)

References

Today’s high, or the previous close, whichever is higher [TS_TR].

[TS_TR](1, 2, 3) http://help.tradestation.com/09_01/tradestationhelp/charting_definitions/true_range.htm
alphapy.market_variables.truelow(f)

Calculate the True Low value.

Parameters:f (pandas.DataFrame) – Dataframe with columns high and low.
Returns:new_column – The array containing the new feature.
Return type:pandas.Series (float)

References

Today’s low, or the previous close, whichever is lower [TS_TR].

alphapy.market_variables.truerange(f)

Calculate the True Range value.

Parameters:f (pandas.DataFrame) – Dataframe with columns high and low.
Returns:new_column – The array containing the new feature.
Return type:pandas.Series (float)

References

True High - True Low [TS_TR].

alphapy.market_variables.up(f, c)

Find the positive values in the series.

Parameters:
  • f (pandas.DataFrame) – Dataframe containing the column c.
  • c (str) – Name of the column in the dataframe f.
Returns:

new_column – The array containing the new feature.

Return type:

pandas.Series (bool)

alphapy.market_variables.upc(f, c)

Get the positive values, with negative values zeroed.

Parameters:
  • f (pandas.DataFrame) – Dataframe with column c.
  • c (str) – Name of the column.
Returns:

new_column – The array containing the new feature.

Return type:

pandas.Series (float)

alphapy.market_variables.vapply(group, vname, vfuncs=None)

Apply a variable to multiple dataframes.

Parameters:
  • group (alphapy.Group) – The input group.
  • vname (str) – The variable to apply to the group.
  • vfuncs (dict, optional) – Dictionary of external modules and functions.
Returns:

None

Return type:

None

Other Parameters:
 

Frame.frames (dict) – Global dictionary of dataframes

See also

vunapply()

alphapy.market_variables.vexec(f, v, vfuncs=None)

Add a variable to the given dataframe.

This is the core function for adding a variable to a dataframe. The default variable functions are already defined locally in alphapy.var; however, you may want to define your own variable functions. If so, then the vfuncs parameter will contain the list of modules and functions to be imported and applied by the vexec function.

To write your own variable function, your function must have a pandas DataFrame as an input parameter and must return a pandas Series that represents the new variable.

Parameters:
  • f (pandas.DataFrame) – Dataframe to contain the new variable.
  • v (str) – Variable to add to the dataframe.
  • vfuncs (dict, optional) – Dictionary of external modules and functions.
Returns:

f – Dataframe with the new variable.

Return type:

pandas.DataFrame

Other Parameters:
 

Variable.variables (dict) – Global dictionary of variables

alphapy.market_variables.vmapply(group, vs, vfuncs=None)

Apply multiple variables to multiple dataframes.

Parameters:
  • group (alphapy.Group) – The input group.
  • vs (list) – The list of variables to apply to the group.
  • vfuncs (dict, optional) – Dictionary of external modules and functions.
Returns:

None

Return type:

None

See also

vmunapply()

alphapy.market_variables.vmunapply(group, vs)

Remove a list of variables from multiple dataframes.

Parameters:
  • group (alphapy.Group) – The input group.
  • vs (list) – The list of variables to remove from the group.
Returns:

None

Return type:

None

See also

vmapply()

alphapy.market_variables.vparse(vname)

Parse a variable name into its respective components.

Parameters:vname (str) – The name of the variable.
Returns:
  • vxlag (str) – Variable name without the lag component.
  • root (str) – The base variable name without the parameters.
  • plist (list) – The parameter list.
  • lag (int) – The offset starting with the current value [0] and counting back, e.g., an offset [1] means the previous value of the variable.

Notes

AlphaPy makes feature creation easy. The syntax of a variable name maps to a function call:

xma_20_50 => xma(20, 50)

Examples

>>> vparse('xma_20_50[1]')
# ('xma_20_50', 'xma', ['20', '50'], 1)
alphapy.market_variables.vsub(v, expr)

Substitute the variable parameters into the expression.

This function performs the parameter substitution when applying features to a dataframe. It is a mechanism for the user to override the default values in any given expression when defining a feature, instead of having to programmatically call a function with new values.

Parameters:
  • v (str) – Variable name.
  • expr (str) – The expression for substitution.
Returns:

The expression with the new, substituted values.

Return type:

newexpr

alphapy.market_variables.vtree(vname)

Get all of the antecedent variables.

Before applying a variable to a dataframe, we have to recursively get all of the child variables, beginning with the starting variable’s expression. Then, we have to extract the variables from all the subsequent expressions. This process continues until all antecedent variables are obtained.

Parameters:vname (str) – A valid variable stored in Variable.variables.
Returns:all_variables – The variables that need to be applied before vname.
Return type:list
Other Parameters:
 Variable.variables (dict) – Global dictionary of variables
alphapy.market_variables.vunapply(group, vname)

Remove a variable from multiple dataframes.

Parameters:
  • group (alphapy.Group) – The input group.
  • vname (str) – The variable to remove from the group.
Returns:

None

Return type:

None

Other Parameters:
 

Frame.frames (dict) – Global dictionary of dataframes

See also

vapply()

alphapy.market_variables.xmadown(f, c='close', pfast=20, pslow=50)

Determine those values of the dataframe that are below the moving average.

Parameters:
  • f (pandas.DataFrame) – Dataframe containing the column c.
  • c (str, optional) – Name of the column in the dataframe f.
  • pfast (int, optional) – The period of the fast moving average.
  • pslow (int, optional) – The period of the slow moving average.
Returns:

new_column – The array containing the new feature.

Return type:

pandas.Series (bool)

References

In the statistics of time series, and in particular the analysis of financial time series for stock trading purposes, a moving-average crossover occurs when, on plotting two moving averages each based on different degrees of smoothing, the traces of these moving averages cross [WIKI_XMA].

[WIKI_XMA](1, 2) https://en.wikipedia.org/wiki/Moving_average_crossover
alphapy.market_variables.xmaup(f, c='close', pfast=20, pslow=50)

Determine those values of the dataframe that are below the moving average.

Parameters:
  • f (pandas.DataFrame) – Dataframe containing the column c.
  • c (str, optional) – Name of the column in the dataframe f.
  • pfast (int, optional) – The period of the fast moving average.
  • pslow (int, optional) – The period of the slow moving average.
Returns:

new_column – The array containing the new feature.

Return type:

pandas.Series (bool)

References

In the statistics of time series, and in particular the analysis of financial time series for stock trading purposes, a moving-average crossover occurs when, on plotting two moving averages each based on different degrees of smoothing, the traces of these moving averages cross [WIKI_XMA].

alphapy.model module

class alphapy.model.Model(specs)

Create a new model.

Parameters:

specs (dict) – The model specifications obtained by reading the model.yml file.

Variables:
  • specs (dict) – The model specifications.
  • X_train (pandas.DataFrame) – Training features in matrix format.
  • X_test (pandas.Series) – Testing features in matrix format.
  • y_train (pandas.DataFrame) – Training labels in vector format.
  • y_test (pandas.Series) – Testing labels in vector format.
  • algolist (list) – Algorithms to use in training.
  • estimators (dict) – Dictionary of estimators (key: algorithm)
  • importances (dict) – Feature Importances (key: algorithm)
  • coefs (dict) – Coefficients, if applicable (key: algorithm)
  • support (dict) – Support Vectors, if applicable (key: algorithm)
  • preds (dict) – Predictions or labels (keys: algorithm, partition)
  • probas (dict) – Probabilities from classification (keys: algorithm, partition)
  • metrics (dict) – Model evaluation metrics (keys: algorith, partition, metric)
Raises:

KeyError – Model specs must include the key algorithms, which is stored in algolist.

alphapy.model.first_fit(model, algo, est)

Fit the model before optimization.

Parameters:
  • model (alphapy.Model) – The model object with specifications.
  • algo (str) – Abbreviation of the algorithm to run.
  • est (alphapy.Estimator) – The estimator to fit.
Returns:

model – The model object with the initial estimator.

Return type:

alphapy.Model

Notes

AlphaPy fits an initial model because the user may choose to get a first score without any additional feature selection or grid search. XGBoost is a special case because it has the advantage of an eval_set and early_stopping_rounds, which can speed up the estimation phase.

alphapy.model.generate_metrics(model, partition)

Generate model evaluation metrics for all estimators.

Parameters:
  • model (alphapy.Model) – The model object with stored predictions.
  • partition (alphapy.Partition) – Reference to the dataset.
Returns:

model – The model object with the completed metrics.

Return type:

alphapy.Model

Notes

AlphaPy takes a brute-force approach to calculating each metric. It calls every scikit-learn function without exception. If the calculation fails for any reason, then the evaluation will still continue without error.

References

For more information about model evaluation and the associated metrics, refer to [EVAL].

[EVAL]http://scikit-learn.org/stable/modules/model_evaluation.html
alphapy.model.get_class_weights(model)

Set the class weights for fitting the model.

Parameters:model (alphapy.Model) – The model object with specifications.
Returns:model – The model object with class weights.
Return type:alphapy.Model
alphapy.model.get_model_config()

Read in the configuration file for AlphaPy.

Parameters:None (None)
Returns:specs – The parameters for controlling AlphaPy.
Return type:dict
Raises:ValueError – Unrecognized value of a model.yml field.
alphapy.model.load_feature_map(model, directory)

Load the feature map from storage. By default, the most recent feature map is loaded into memory.

Parameters:
  • model (alphapy.Model) – The model object to contain the feature map.
  • directory (str) – Full directory specification of the feature map’s location.
Returns:

model – The model object containing the feature map.

Return type:

alphapy.Model

alphapy.model.load_predictor(directory)

Load the model predictor from storage. By default, the most recent model is loaded into memory.

Parameters:directory (str) – Full directory specification of the predictor’s location.
Returns:predictor – The scoring function.
Return type:function
alphapy.model.make_predictions(model, algo, calibrate)

Make predictions for the training and testing data.

Parameters:
  • model (alphapy.Model) – The model object with specifications.
  • algo (str) – Abbreviation of the algorithm to make predictions.
  • calibrate (bool) – If True, calibrate the probabilities for a classifier.
Returns:

model – The model object with the predictions.

Return type:

alphapy.Model

Notes

For classification, calibration is a precursor to making the actual predictions. In this case, AlphaPy predicts both labels and probabilities. For regression, real values are predicted.

alphapy.model.predict_best(model)

Select the best model based on score.

Parameters:model (alphapy.Model) – The model object with all of the estimators.
Returns:model – The model object with the best estimator.
Return type:alphapy.Model

Notes

Best model selection is based on a scoring function. If the objective is to minimize (e.g., negative log loss), then we select the model with the algorithm that has the lowest score. If the objective is to maximize, then we select the algorithm with the highest score (e.g., AUC).

For multiple algorithms, AlphaPy always creates a blended model. Therefore, the best algorithm that is selected could actually be the blended model itself.

alphapy.model.predict_blend(model)

Make predictions from a blended model.

Parameters:model (alphapy.Model) – The model object with all of the estimators.
Returns:model – The model object with the blended estimator.
Return type:alphapy.Model

Notes

For classification, AlphaPy uses logistic regression for creating a blended model. For regression, ridge regression is applied.

alphapy.model.save_feature_map(model, timestamp)

Save the feature map to disk.

Parameters:
  • model (alphapy.Model) – The model object containing the feature map.
  • timestamp (str) – Date in yyyy-mm-dd format.
Returns:

None

Return type:

None

alphapy.model.save_model(model, tag, partition)

Save the results in the model file.

Parameters:
  • model (alphapy.Model) – The model object to save.
  • tag (str) – A unique identifier for the output files, e.g., a date stamp.
  • partition (alphapy.Partition) – Reference to the dataset.
Returns:

None

Return type:

None

Notes

The following components are extracted from the model object and saved to disk:

  • Model predictor (via joblib/pickle)
  • Predictions
  • Probabilities (classification only)
  • Rankings
  • Submission File (optional)
alphapy.model.save_predictions(model, tag, partition)

Save the predictions to disk.

Parameters:
  • model (alphapy.Model) – The model object to save.
  • tag (str) – A unique identifier for the output files, e.g., a date stamp.
  • partition (alphapy.Partition) – Reference to the dataset.
Returns:

  • preds (numpy array) – The prediction vector.
  • probas (numpy array) – The probability vector.

alphapy.model.save_predictor(model, timestamp)

Save the time-stamped model predictor to disk.

Parameters:
  • model (alphapy.Model) – The model object that contains the best estimator.
  • timestamp (str) – Date in yyyy-mm-dd format.
Returns:

None

Return type:

None

alphapy.optimize module

alphapy.optimize.grid_report(results, n_top=3)

Report the top grid search scores.

Parameters:
  • results (dict of numpy arrays) – Mean test scores for each grid search iteration.
  • n_top (int, optional) – The number of grid search results to report.
Returns:

None

Return type:

None

Return the best hyperparameters for a grid search.

Parameters:
  • model (alphapy.Model) – The model object with grid search parameters.
  • estimator (alphapy.Estimator) – The estimator containing the hyperparameter grid.
Returns:

model – The model object with the grid search estimator.

Return type:

alphapy.Model

Notes

To reduce the time required for grid search, use either randomized grid search with a fixed number of iterations or a full grid search with subsampling. AlphaPy uses the scikit-learn Pipeline with feature selection to reduce the feature space.

References

For more information about grid search, refer to [GRID].

[GRID]http://scikit-learn.org/stable/modules/grid_search.html#grid-search

To learn about pipelines, refer to [PIPE].

[PIPE]http://scikit-learn.org/stable/modules/pipeline.html#pipeline

Return the best feature set using recursive feature elimination.

Parameters:
  • model (alphapy.Model) – The model object with RFE parameters.
  • algo (str) – Abbreviation of the algorithm to run.
Returns:

model – The model object with the RFE support vector and the best estimator.

Return type:

alphapy.Model

See also

rfecv_search()

Notes

If a scoring function is available, then AlphaPy can perform RFE with Cross-Validation (CV); otherwise, it just does RFE without CV, as in this function.

References

For more information about Recursive Feature Elimination, refer to [RFE].

[RFE]http://scikit-learn.org/stable/modules/feature_selection.html#recursive-feature-elimination

Return the best feature set using recursive feature elimination with cross-validation.

Parameters:
  • model (alphapy.Model) – The model object with RFE parameters.
  • algo (str) – Abbreviation of the algorithm to run.
Returns:

model – The model object with the RFE support vector and the best estimator.

Return type:

alphapy.Model

See also

rfe_search()

Notes

If a scoring function is available, then AlphaPy can perform RFE with Cross-Validation (CV), as in this function; otherwise, it just does RFE without CV.

References

For more information about Recursive Feature Elimination, refer to [RFECV].

[RFECV]http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFECV.html

alphapy.plots module

alphapy.plots.generate_plots(model, partition)

Generate plots while running the pipeline.

Parameters:
  • model (alphapy.Model) – The model object with plotting specifications.
  • partition (alphapy.Partition) – Reference to the dataset.
Returns:

None

Return type:

None

alphapy.plots.get_partition_data(model, partition)

Get the X, y pair for a given model and partition

Parameters:
  • model (alphapy.Model) – The model object with partition data.
  • partition (alphapy.Partition) – Reference to the dataset.
Returns:

  • X (numpy array) – The feature matrix.
  • y (numpy array) – The target vector.

Raises:

TypeError – Partition must be train or test.

alphapy.plots.get_plot_directory(model)

Get the plot output directory of a model.

Parameters:model (alphapy.Model) – The model object with directory information.
Returns:plot_directory – The output directory to write the plot.
Return type:str
alphapy.plots.plot_boundary(model, partition, f1=0, f2=1)

Display a comparison of classifiers

Parameters:
  • model (alphapy.Model) – The model object with plotting specifications.
  • partition (alphapy.Partition) – Reference to the dataset.
  • f1 (int) – Number of the first feature to compare.
  • f2 (int) – Number of the second feature to compare.
Returns:

None

Return type:

None

References

Code excerpts from authors:

  • Gael Varoquaux
  • Andreas Muller

http://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html

alphapy.plots.plot_box(df, x, y, hue, tag='eda', directory=None)

Display a Box Plot.

Parameters:
  • df (pandas.DataFrame) – The dataframe containing the x and y features.
  • x (str) – Variable name in df to display along the x-axis.
  • y (str) – Variable name in df to display along the y-axis.
  • hue (str) – Variable name to be used as hue, i.e., another data dimension.
  • tag (str) – Unique identifier for the plot.
  • directory (str, optional) – The full specification of the plot location.
Returns:

None

Return type:

None.

References

http://seaborn.pydata.org/generated/seaborn.boxplot.html

alphapy.plots.plot_calibration(model, partition)

Display scikit-learn calibration plots.

Parameters:
  • model (alphapy.Model) – The model object with plotting specifications.
  • partition (alphapy.Partition) – Reference to the dataset.
Returns:

None

Return type:

None

References

Code excerpts from authors:

http://scikit-learn.org/stable/auto_examples/calibration/plot_calibration_curve.html#sphx-glr-auto-examples-calibration-plot-calibration-curve-py

alphapy.plots.plot_candlestick(df, symbol, datecol='date', directory=None)

Plot time series data.

Parameters:
  • df (pandas.DataFrame) – The dataframe containing the target feature.
  • symbol (str) – Unique identifier of the data to plot.
  • datecol (str, optional) – The name of the date column.
  • directory (str, optional) – The full specification of the plot location.
Returns:

None

Return type:

None.

Notes

The dataframe df must contain these columns:

  • open
  • high
  • low
  • close

References

http://bokeh.pydata.org/en/latest/docs/gallery/candlestick.html

alphapy.plots.plot_confusion_matrix(model, partition)

Draw the confusion matrix.

Parameters:
  • model (alphapy.Model) – The model object with plotting specifications.
  • partition (alphapy.Partition) – Reference to the dataset.
Returns:

None

Return type:

None

References

http://scikit-learn.org/stable/modules/model_evaluation.html#confusion-matrix

alphapy.plots.plot_distribution(df, target, tag='eda', directory=None)

Display a Distribution Plot.

Parameters:
  • df (pandas.DataFrame) – The dataframe containing the target feature.
  • target (str) – The target variable for the distribution plot.
  • tag (str) – Unique identifier for the plot.
  • directory (str, optional) – The full specification of the plot location.
Returns:

None

Return type:

None.

References

http://seaborn.pydata.org/generated/seaborn.distplot.html

alphapy.plots.plot_facet_grid(df, target, frow, fcol, tag='eda', directory=None)

Plot a Seaborn faceted histogram grid.

Parameters:
  • df (pandas.DataFrame) – The dataframe containing the features.
  • target (str) – The target variable for contrast.
  • frow (list of str) – Feature names for the row elements of the grid.
  • fcol (list of str) – Feature names for the column elements of the grid.
  • tag (str) – Unique identifier for the plot.
  • directory (str, optional) – The full specification of the plot location.
Returns:

None

Return type:

None.

References

http://seaborn.pydata.org/generated/seaborn.FacetGrid.html

alphapy.plots.plot_importance(model, partition)

Display scikit-learn feature importances.

Parameters:
  • model (alphapy.Model) – The model object with plotting specifications.
  • partition (alphapy.Partition) – Reference to the dataset.
Returns:

None

Return type:

None

References

http://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_importances.html

alphapy.plots.plot_learning_curve(model, partition)

Generate learning curves for a given partition.

Parameters:
  • model (alphapy.Model) – The model object with plotting specifications.
  • partition (alphapy.Partition) – Reference to the dataset.
Returns:

None

Return type:

None

References

http://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_importances.html

alphapy.plots.plot_partial_dependence(est, X, features, fnames, tag, n_jobs=-1, verbosity=0, directory=None)

Display a Partial Dependence Plot.

Parameters:
  • est (estimator) – The scikit-learn estimator for calculating partial dependence.
  • X (numpy array) – The data on which the estimator was trained.
  • features (list of int) – Feature numbers of X.
  • fnames (list of str) – The feature names to plot.
  • tag (str) – Unique identifier for the plot
  • n_jobs (int, optional) – The maximum number of parallel jobs.
  • verbosity (int, optional) – The amount of logging from 0 (minimum) and higher.
  • directory (str) – Directory where the plot will be stored.
Returns:

None

Return type:

None.

References

http://scikit-learn.org/stable/auto_examples/ensemble/plot_partial_dependence.html#sphx-glr-auto-examples-ensemble-plot-partial-dependence-py

alphapy.plots.plot_roc_curve(model, partition)

Display ROC Curves with Cross-Validation.

Parameters:
  • model (alphapy.Model) – The model object with plotting specifications.
  • partition (alphapy.Partition) – Reference to the dataset.
Returns:

None

Return type:

None

References

http://scikit-learn.org/stable/modules/model_evaluation.html#receiver-operating-characteristic-roc

alphapy.plots.plot_scatter(df, features, target, tag='eda', directory=None)

Plot a scatterplot matrix, also known as a pair plot.

Parameters:
  • df (pandas.DataFrame) – The dataframe containing the features.
  • features (list of str) – The features to compare in the scatterplot.
  • target (str) – The target variable for contrast.
  • tag (str) – Unique identifier for the plot.
  • directory (str, optional) – The full specification of the plot location.
Returns:

None

Return type:

None.

References

https://seaborn.pydata.org/examples/scatterplot_matrix.html

alphapy.plots.plot_swarm(df, x, y, hue, tag='eda', directory=None)

Display a Swarm Plot.

Parameters:
  • df (pandas.DataFrame) – The dataframe containing the x and y features.
  • x (str) – Variable name in df to display along the x-axis.
  • y (str) – Variable name in df to display along the y-axis.
  • hue (str) – Variable name to be used as hue, i.e., another data dimension.
  • tag (str) – Unique identifier for the plot.
  • directory (str, optional) – The full specification of the plot location.
Returns:

None

Return type:

None.

References

http://seaborn.pydata.org/generated/seaborn.swarmplot.html

alphapy.plots.plot_time_series(df, target, tag='eda', directory=None)

Plot time series data.

Parameters:
  • df (pandas.DataFrame) – The dataframe containing the target feature.
  • target (str) – The target variable for the time series plot.
  • tag (str) – Unique identifier for the plot.
  • directory (str, optional) – The full specification of the plot location.
Returns:

None

Return type:

None.

References

http://seaborn.pydata.org/generated/seaborn.tsplot.html

alphapy.plots.plot_validation_curve(model, partition, pname, prange)

Generate scikit-learn validation curves.

Parameters:
  • model (alphapy.Model) – The model object with plotting specifications.
  • partition (alphapy.Partition) – Reference to the dataset.
  • pname (str) – Name of the hyperparameter to test.
  • prange (numpy array) – The values of the hyperparameter that will be evaluated.
Returns:

None

Return type:

None

References

http://scikit-learn.org/stable/auto_examples/model_selection/plot_validation_curve.html#sphx-glr-auto-examples-model-selection-plot-validation-curve-py

alphapy.plots.write_plot(vizlib, plot, plot_type, tag, directory=None)

Save the plot to a file, or display it interactively.

Parameters:
  • vizlib (str) – The visualization library: 'matplotlib', 'seaborn', or 'bokeh'.
  • plot (module) – Plotting context, e.g., plt.
  • plot_type (str) – Type of plot to generate.
  • tag (str) – Unique identifier for the plot.
  • directory (str, optional) – The full specification for the directory location. if directory is None, then the plot is displayed interactively.
Returns:

None

Return type:

None.

Raises:

ValueError – Unrecognized data visualization library.

References

Visualization Libraries:

alphapy.portfolio module

class alphapy.portfolio.Portfolio(group_name, tag, space=<alphapy.space.Space instance>, maxpos=10, posby='close', kopos=0, koby='-profit', restricted=False, weightby='quantity', startcap=100000, margin=0.5, mincash=0.2, fixedfrac=0.1, maxloss=0.1)

Create a new portfolio with a unique name. All portfolios are stored in Portfolio.portfolios.

Parameters:
  • group_name (str) – The group represented in the portfolio.
  • tag (str) – A unique identifier.
  • space (alphapy.Space, optional) – Namespace for the portfolio.
  • maxpos (int, optional) – The maximum number of positions.
  • posby (str, optional) – The denominator for position sizing.
  • kopos (int, optional) – The number of positions to kick out from the portfolio.
  • koby (str, optional) – The “kick out” criteria. For example, a koby value of ‘-profit’ means the three least profitable positions will be closed.
  • restricted (bool, optional) – If True, then the portfolio is limited to a maximum number of positions maxpos.
  • weightby (str, optional) – The weighting variable to balance the portfolio, e.g., by closing price, by volatility, or by any column.
  • startcap (float, optional) – The amount of starting capital.
  • margin (float, optional) – The amount of margin required, expressed as a fraction.
  • mincash (float, optional) – Minimum amount of cash on hand, expressed as a fraction of the total portfolio value.
  • fixedfrac (float, optional) – The fixed fraction for any given position.
  • maxloss (float, optional) – Stop loss for any given position.
Variables:
  • portfolios (dict) – Class variable for storing all known portfolios
  • value (float) – Class variable for storing all known portfolios
  • netprofit (float) – Net profit ($) since previous valuation.
  • netreturn (float) – Net return (%) since previous valuation
  • totalprofit (float) – Total profit ($) since inception.
  • totalreturn (float) – Total return (%) since inception.
portfolios = {}
class alphapy.portfolio.Position(portfolio, name, opendate)

Create a new position in the portfolio.

Parameters:
  • portfolio (alphaPy.portfolio) – The portfolio that will contain the position.
  • name (str) – A unique identifier such as a stock symbol.
  • opendate (datetime) – Date the position is opened.
Variables:
  • date (timedate) – Current date of the position.
  • name (str) – A unique identifier.
  • status (str) – State of the position: 'opened' or 'closed'.
  • mpos (str) – Market position 'long' or 'short'.
  • quantity (float) – The net size of the position.
  • price (float) – The current price of the instrument.
  • value (float) – The total dollar value of the position.
  • profit (float) – The net profit of the current position.
  • netreturn (float) – The Return On Investment (ROI), or net return.
  • opened (datetime) – Date the position is opened.
  • held (int) – The holding period since the position was opened.
  • costbasis (float) – Overall cost basis.
  • trades (list of Trade) – The executed trades for the position so far.
  • ntrades (int) – Total number of trades.
  • pdata (pandas DataFrame) – Price data for the given name.
  • multiplier (float) – Multiple for instrument type (e.g., 1.0 for stocks).
class alphapy.portfolio.Trade(name, order, quantity, price, tdate)

Initiate a trade.

Parameters:
  • name (str) – The symbol to trade.
  • order (alphapy.Orders) – Long or short trade for entry or exit.
  • quantity (int) – The quantity for the order.
  • price (str) – The execution price of the trade.
  • tdate (datetime) – The date and time of the trade.
Variables:

states (list of str) – Trade state names for a dataframe.

states = ['name', 'order', 'quantity', 'price']
alphapy.portfolio.add_position(p, name, pos)

Add a position to a portfolio.

Parameters:
  • p (alphapy.Portfolio) – Portfolio that will hold the position.
  • name (int) – Unique identifier for the position, e.g., a stock symbol.
  • pos (alphapy.Position) – New position to add to the portfolio.
Returns:

p – Portfolio with the new position.

Return type:

alphapy.Portfolio

alphapy.portfolio.allocate_trade(p, pos, trade)

Determine the trade allocation for a given portfolio.

Parameters:
  • p (alphapy.Portfolio) – Portfolio that will hold the new position.
  • pos (alphapy.Position) – Position to update.
  • trade (alphapy.Trade) – The proposed trade.
Returns:

allocation – The trade size that can be allocated for the portfolio.

Return type:

float

alphapy.portfolio.balance(p, tdate, cashlevel)

Balance the portfolio using a weighting variable.

Rebalancing is the process of equalizing a portfolio’s positions using some criterion. For example, if a portfolio is dollar-weighted, then one position can increase in proportion to the rest of the portfolio, i.e., its fraction of the overall portfolio is greater than the other positions. To make the portfolio “equal dollar”, then some positions have to be decreased and others decreased.

The rebalancing process is periodic (e.g., once per month) and generates a series of trades to balance the positions. Other portfolios are volatility-weighted because a more volatile stock has a greater effect on the beta, i.e., the more volatile the instrument, the smaller the position size.

Technically, any type of weight can be used for rebalancing, so AlphaPy gives the user the ability to specify a weightby column name.

Parameters:
  • p (alphapy.Portfolio) – Portfolio to rebalance.
  • tdate (datetime) – The rebalancing date.
  • cashlevel (float) – The cash level to maintain during rebalancing.
Returns:

p – The rebalanced portfolio.

Return type:

alphapy.Portfolio

Notes

Warning

The portfolio management functions balance, kick_out, and stop_loss are not part of the main StockStream pipeline, and thus have not been thoroughly tested. Feel free to exercise the code and report any issues.

alphapy.portfolio.close_position(p, position, tdate)

Close the position and remove it from the portfolio.

Parameters:
  • p (alphapy.Portfolio) – Portfolio holding the position.
  • position (alphapy.Position) – Position to close.
  • tdate (datetime) – The date for pricing the closed position.
Returns:

p – Portfolio with the removed position.

Return type:

alphapy.Portfolio

alphapy.portfolio.delete_portfolio(p)

Delete the portfolio.

Parameters:p (alphapy.Portfolio) – Portfolio to delete.
Returns:None
Return type:None
alphapy.portfolio.deposit_portfolio(p, cash, tdate)

Deposit cash into a given portfolio.

Parameters:
  • p (alphapy.Portfolio) – Portfolio to accept the deposit.
  • cash (float) – Cash amount to deposit.
  • tdate (datetime) – The date of deposit.
Returns:

p – Portfolio with the added cash.

Return type:

alphapy.Portfolio

alphapy.portfolio.exec_trade(p, name, order, quantity, price, tdate)

Execute a trade for a portfolio.

Parameters:
  • p (alphapy.Portfolio) – Portfolio in which to trade.
  • name (str) – The symbol to trade.
  • order (alphapy.Orders) – Long or short trade for entry or exit.
  • quantity (int) – The quantity for the order.
  • price (str) – The execution price of the trade.
  • tdate (datetime) – The date and time of the trade.
Returns:

tsize – The executed trade size.

Return type:

float

Other Parameters:
 

Frame.frames (dict) – Dataframe for the price data.

alphapy.portfolio.gen_portfolio(model, system, group, tframe, startcap=100000, posby='close')

Create a portfolio from a trades frame.

Parameters:
  • model (alphapy.Model) – The model with specifications.
  • system (str) – Name of the system.
  • group (alphapy.Group) – The group of instruments in the portfolio.
  • tframe (pandas.DataFrame) – The input trade list from running the system.
  • startcap (float) – Starting capital.
  • posby (str) – The position sizing column in the price dataframe.
Returns:

p – The generated portfolio.

Return type:

alphapy.Portfolio

Raises:

MemoryError – Could not allocate Portfolio.

Notes

This function also generates the files required for analysis by the pyfolio package:

  • Returns File
  • Positions File
  • Transactions File
alphapy.portfolio.kick_out(p, tdate)

Trim the portfolio based on filter criteria.

To reduce a portfolio’s positions, AlphaPy can rank the positions on some criterion, such as open profit or net return. On a periodic basis, the worst performers can be culled from the portfolio.

Parameters:
  • p (alphapy.Portfolio) – The portfolio for reducing positions.
  • tdate (datetime) – The date to trim the portfolio positions.
Returns:

p – The reduced portfolio.

Return type:

alphapy.Portfolio

Notes

Warning

The portfolio management functions kick_out, balance, and stop_loss are not part of the main StockStream pipeline, and thus have not been thoroughly tested. Feel free to exercise the code and report any issues.

alphapy.portfolio.portfolio_name(group_name, tag)

Return the name of the portfolio.

Parameters:
  • group_name (str) – The group represented in the portfolio.
  • tag (str) – A unique identifier.
Returns:

port_name – Portfolio name.

Return type:

str

alphapy.portfolio.remove_position(p, name)

Remove a position from a portfolio by name.

Parameters:
  • p (alphapy.Portfolio) – Portfolio with the current position.
  • name (int) – Unique identifier for the position, e.g., a stock symbol.
Returns:

p – Portfolio with the deleted position.

Return type:

alphapy.Portfolio

alphapy.portfolio.stop_loss(p, tdate)

Trim the portfolio based on stop-loss criteria.

Parameters:
  • p (alphapy.Portfolio) – The portfolio for reducing positions based on maxloss.
  • tdate (datetime) – The date to trim any underperforming positions.
Returns:

p – The reduced portfolio.

Return type:

alphapy.Portfolio

Notes

Warning

The portfolio management functions stop_loss, balance, and kick_out are not part of the main StockStream pipeline, and thus have not been thoroughly tested. Feel free to exercise the code and report any issues.

alphapy.portfolio.update_portfolio(p, pos, trade)

Update the portfolio positions.

Parameters:
  • p (alphapy.Portfolio) – Portfolio holding the position.
  • pos (alphapy.Position) – Position to update.
  • trade (alphapy.Trade) – Trade for updating the position and portfolio.
Returns:

p – Portfolio with the revised position.

Return type:

alphapy.Portfolio

alphapy.portfolio.update_position(position, trade)

Add the new trade to the position and revalue.

Parameters:
  • position (alphapy.Position) – The position to be update.
  • trade (alphapy.Trade) – Trade for updating the position.
Returns:

position – New value of the position.

Return type:

alphapy.Position

alphapy.portfolio.valuate_portfolio(p, tdate)

Value the portfolio based on the current positions.

Parameters:
  • p (alphapy.Portfolio) – Portfolio for calculating profit and return.
  • tdate (datetime) – The date of valuation.
Returns:

p – Portfolio with the new valuation.

Return type:

alphapy.Portfolio

alphapy.portfolio.valuate_position(position, tdate)

Valuate the position for the given date.

Parameters:
  • position (alphapy.Position) – The position to be valued.
  • tdate (timedate) – Date to value the position.
Returns:

position – New value of the position.

Return type:

alphapy.Position

Notes

An Example of Cost Basis

Date Shares Price Amount
11/09/16 +100 10.0 1,000
12/14/16 +200 15.0 3,000
04/05/17 -500 20.0 10,000
All 800   14,000

The cost basis is calculated as the total value of all trades (14,000) divided by the total number of shares traded (800), so 14,000 / 800 = 17.5, and the net position is -200.

alphapy.portfolio.withdraw_portfolio(p, cash, tdate)

Withdraw cash from a given portfolio.

Parameters:
  • p (alphapy.Portfolio) – Portfolio to accept the withdrawal.
  • cash (float) – Cash amount to withdraw.
  • tdate (datetime) – The date of withdrawal.
Returns:

p – Portfolio with the withdrawn cash.

Return type:

alphapy.Portfolio

alphapy.space module

class alphapy.space.Space(subject='stock', schema='prices', fractal='1d')

Create a new namespace.

Parameters:
  • subject (str) – An identifier for a group of related items.
  • schema (str) – The data related to the subject.
  • fractal (str) – The time fractal of the data, e.g., “5m” or “1d”.
alphapy.space.space_name(subject, schema, fractal)

Get the namespace string.

Parameters:
  • subject (str) – An identifier for a group of related items.
  • schema (str) – The data related to the subject.
  • fractal (str) – The time fractal of the data, e.g., “5m” or “1d”.
Returns:

name – The joined namespace string.

Return type:

str

alphapy.sport_flow module

alphapy.sport_flow.add_features(frame, fdict, flen, prefix='')

Add new features to a dataframe with the specified dictionary.

Parameters:
  • frame (pandas.DataFrame) – The dataframe to extend with new features defined by fdict.
  • fdict (dict) – A dictionary of column names (key) and data types (value).
  • flen (int) – Length of frame.
  • prefix (str, optional) – Prepend all columns with a prefix.
Returns:

frame – The dataframe with the added features.

Return type:

pandas.DataFrame

alphapy.sport_flow.generate_delta_data(frame, fdict, prefix1, prefix2)

Subtract two similar columns to get the delta value.

Parameters:
  • frame (pandas.DataFrame) – The input model frame.
  • fdict (dict) – A dictionary of column names (key) and data types (value).
  • prefix1 (str) – The prefix of the first team.
  • prefix2 (str) – The prefix of the second team.
Returns:

frame – The completed dataframe with the delta data.

Return type:

pandas.DataFrame

alphapy.sport_flow.generate_team_frame(team, tf, home_team, away_team, window)

Calculate statistics for each team.

Parameters:
  • team (str) – The abbreviation for the team.
  • tf (pandas.DataFrame) – The initial team frame.
  • home_team (str) – Label for the home team.
  • away_team (str) – Label for the away team.
  • window (int) – The value for the rolling window to calculate means and sums.
Returns:

tf – The completed team frame.

Return type:

pandas.DataFrame

alphapy.sport_flow.get_day_offset(date_vector)

Compute the day offsets between games.

Parameters:date_vector (pandas.Series) – The date column.
Returns:day_offset – A vector of day offsets between adjacent dates.
Return type:pandas.Series
alphapy.sport_flow.get_losses(point_margin)

Determine a loss based on the point margin.

Parameters:point_margin (int) – The point margin can be positive, zero, or negative.
Returns:lost – If the point margin is less than 0, return 1, else 0.
Return type:int
alphapy.sport_flow.get_point_margin(row, score, opponent_score)

Get the point margin for a game.

Parameters:
  • row (pandas.Series) – The row of a game.
  • score (int) – The score for one team.
  • opponent_score (int) – The score for the other team.
Returns:

point_margin – The resulting point margin (0 if NaN).

Return type:

int

alphapy.sport_flow.get_series_diff(series)

Perform the difference operation on a series.

Parameters:series (pandas.Series) – The series for the diff operation.
Returns:new_series – The differenced series.
Return type:pandas.Series
alphapy.sport_flow.get_sport_config()

Read the configuration file for SportFlow.

Parameters:None (None)
Returns:specs – The parameters for controlling SportFlow.
Return type:dict
alphapy.sport_flow.get_streak(series, start_index, window)

Calculate the current streak.

Parameters:
  • series (pandas.Series) – A Boolean series for calculating streaks.
  • start_index (int) – The offset of the series to start counting.
  • window (int) – The period over which to count.
Returns:

streak – The count value for the current streak.

Return type:

int

alphapy.sport_flow.get_team_frame(game_frame, team, home, away)

Calculate statistics for each team.

Parameters:
  • game_frame (pandas.DataFrame) – The game frame for a given season.
  • team (str) – The team abbreviation.
  • home (str) – The label of the home team column.
  • away (int) – The label of the away team column.
Returns:

team_frame – The extracted team frame.

Return type:

pandas.DataFrame

alphapy.sport_flow.get_ties(point_margin)

Determine a tie based on the point margin.

Parameters:point_margin (int) – The point margin can be positive, zero, or negative.
Returns:tied – If the point margin is equal to 0, return 1, else 0.
Return type:int
alphapy.sport_flow.get_wins(point_margin)

Determine a win based on the point margin.

Parameters:point_margin (int) – The point margin can be positive, zero, or negative.
Returns:won – If the point margin is greater than 0, return 1, else 0.
Return type:int
alphapy.sport_flow.insert_model_data(mf, mpos, mdict, tf, tpos, prefix)

Insert a row from the team frame into the model frame.

Parameters:
  • mf (pandas.DataFrame) – The model frame for a single season.
  • mpos (int) – The position in the model frame where to insert the row.
  • mdict (dict) – A dictionary of column names (key) and data types (value).
  • tf (pandas.DataFrame) – The team frame for a season.
  • tpos (int) – The position of the row in the team frame.
  • prefix (str) – The prefix to join with the mdict key.
Returns:

mf – The .

Return type:

pandas.DataFrame

alphapy.sport_flow.main(args=None)

The main program for SportFlow.

Notes

  1. Initialize logging.
  2. Parse the command line arguments.
  3. Get the game configuration.
  4. Get the model configuration.
  5. Generate game frames for each season.
  6. Create statistics for each team.
  7. Merge the team frames into the final model frame.
  8. Run the AlphaPy pipeline.
Raises:ValueError – Training date must be before prediction date.

alphapy.system module

class alphapy.system.System(name, longentry, shortentry=None, longexit=None, shortexit=None, holdperiod=0, scale=False)

Bases: object

Create a new system. All systems are stored in System.systems. Duplicate names are not allowed.

Parameters:
  • name (str) – The system name.
  • longentry (str) – Name of the conditional feature for a long entry.
  • shortentry (str, optional) – Name of the conditional feature for a short entry.
  • longexit (str, optional) – Name of the conditional feature for a long exit.
  • shortexit (str, optional) – Name of the conditional feature for a short exit.
  • holdperiod (int, optional) – Holding period of a position.
  • scale (bool, optional) – Add to a position for a signal in the same direction.
Variables:

systems (dict) – Class variable for storing all known systems

Examples

>>> System('closer', hc, lc)
systems = {}
alphapy.system.long_short(system, name, space, quantity)

Run a long/short system.

A long/short system is always in the market. At any given time, either a long position is active, or a short position is active.

Parameters:
  • system (alphapy.System) – The long/short system to run.
  • name (str) – The symbol to trade.
  • space (alphapy.Space) – Namespace of instrument prices.
  • quantity (float) – The amount of the name to trade, e.g., number of shares
Returns:

tradelist – List of trade entries and exits.

Return type:

list

Other Parameters:
 

Frame.frames (dict) – All of the data frames containing price data.

alphapy.system.open_range_breakout(name, space, quantity, t1=3, t2=12)

Run an Opening Range Breakout (ORB) system.

An ORB system is an intraday strategy that waits for price to “break out” in a certain direction after establishing an initial High-Low range. The timing of the trade is either time-based (e.g., 30 minutes after the Open) or price-based (e.g., 20% of the average daily range). Either the position is held until the end of the trading day, or the position is closed with a stop loss (e.g., the other side of the opening range).

Parameters:
  • name (str) – The symbol to trade.
  • space (alphapy.Space) – Namespace of instrument prices.
  • quantity (float) – The amount of the name to trade, e.g., number of shares
Returns:

tradelist – List of trade entries and exits.

Return type:

list

Other Parameters:
 

Frame.frames (dict) – All of the data frames containing price data.

alphapy.system.run_system(model, system, group, system_params=None, quantity=1)

Run a system for a given group, creating a trades frame.

Parameters:
  • model (alphapy.Model) – The model object with specifications.
  • system (alphapy.System or str) – The system to run, either a long/short system or a local one identified by function name, e.g., ‘open_range_breakout’.
  • group (alphapy.Group) – The group of symbols to test.
  • system_params (list, optional) – The parameters for the given system.
  • quantity (float, optional) – The amount to trade for each symbol, e.g., number of shares
Returns:

tf – All of the trades for this group.

Return type:

pandas.DataFrame

alphapy.utilities module

alphapy.utilities.get_datestamp()

Returns today’s datestamp.

Returns:datestamp – The valid date string in YYYY-mm-dd format.
Return type:str
alphapy.utilities.np_store_data(data, dir_name, file_name, extension, separator)

Store NumPy data in a file.

Parameters:
  • data (numpy array) – The model component to store
  • dir_name (str) – Full directory specification.
  • file_name (str) – Name of the file to read, excluding the extension.
  • extension (str) – File name extension, e.g., csv.
  • separator (str) – The delimiter between fields in the file.
Returns:

None

Return type:

None

alphapy.utilities.remove_list_items(elements, alist)

Remove one or more items from the given list.

Parameters:
  • elements (list) – The items to remove from the list alist.
  • alist (list) – Any object of any type can be a list item.
Returns:

sublist – The subset of items after removal.

Return type:

list

Examples

>>> test_list = ['a', 'b', 'c', test_func]
>>> remove_list_items([test_func], test_list)  # ['a', 'b', 'c']
alphapy.utilities.subtract_days(date_string, ndays)

Subtract a number of days from a given date.

Parameters:
  • date_string (str) – An alphanumeric string in the format %Y-%m-%d.
  • ndays (int) – Number of days to subtract.
Returns:

new_date_string – The adjusted date string in the format %Y-%m-%d.

Return type:

str

Examples

>>> subtract_days('2017-11-10', 31)   # '2017-10-10'
alphapy.utilities.valid_date(date_string)

Determine whether or not the given string is a valid date.

Parameters:date_string (str) – An alphanumeric string in the format %Y-%m-%d.
Returns:date_string – The valid date string.
Return type:str
Raises:ValueError – Not a valid date.

Examples

>>> valid_date('2016-7-1')   # datetime.datetime(2016, 7, 1, 0, 0)
>>> valid_date('345')        # ValueError: Not a valid date
alphapy.utilities.valid_name(name)

Determine whether or not the given string is a valid alphanumeric string.

Parameters:name (str) – An alphanumeric identifier.
Returns:resultTrue if the name is valid, else False.
Return type:bool

Examples

>>> valid_name('alpha')   # True
>>> valid_name('!alpha')  # False