alphapy package¶
Submodules¶
alphapy.__main__ module¶
-
alphapy.__main__.
main
(args=None)¶ AlphaPy Main Program
Notes
Initialize logging.
Parse the command line arguments.
Get the model configuration.
Create the model object.
Call the main AlphaPy pipeline.
-
alphapy.__main__.
main_pipeline
(model)¶ AlphaPy Main Pipeline
- Parameters
model (alphapy.Model) – The model specifications for the pipeline.
- Returns
model – The final model.
- Return type
alphapy.Model
-
alphapy.__main__.
prediction_pipeline
(model)¶ AlphaPy Prediction Pipeline
- Parameters
model (alphapy.Model) – The model object for controlling the pipeline.
- Returns
None
- Return type
None
Notes
The saved model is loaded from disk, and predictions are made on the new testing data.
-
alphapy.__main__.
training_pipeline
(model)¶ AlphaPy Training Pipeline
- Parameters
model (alphapy.Model) – The model object for controlling the pipeline.
- Returns
model – The final results are stored in the model object.
- Return type
alphapy.Model
- Raises
KeyError – If the number of columns of the train and test data do not match, then this exception is raised.
alphapy.alias module¶
-
class
alphapy.alias.
Alias
(name, expr, replace=False)¶ Bases:
object
Create a new alias as a key-value pair. All aliases are stored in
Alias.aliases
. Duplicate keys or values are not allowed, unless thereplace
parameter isTrue
.- Parameters
name (str) – Alias key.
expr (str) – Alias value.
replace (bool, optional) – Replace the current key-value pair if it already exists.
- Variables
Alias.aliases (dict) – Class variable for storing all known aliases
Examples
>>> Alias('atr', 'ma_truerange') >>> Alias('hc', 'higher_close')
-
aliases
= {}¶
-
alphapy.alias.
get_alias
(alias)¶ Find an alias value with the given key.
- Parameters
alias (str) – Key for finding the alias value.
- Returns
alias_value – Value for the corresponding key.
- Return type
str
Examples
>>> alias_value = get_alias('atr') >>> alias_value = get_alias('hc')
alphapy.analysis module¶
-
class
alphapy.analysis.
Analysis
(model, group)¶ Bases:
object
Create a new analysis for a group. All analyses are stored in
Analysis.analyses
. Duplicate keys are not allowed.- Parameters
model (alphapy.Model) – Model object for the analysis.
group (alphapy.Group) – The group of members in the analysis.
- Variables
Analysis.analyses (dict) – Class variable for storing all known analyses
-
analyses
= {}¶
-
alphapy.analysis.
analysis_name
(gname, target)¶ Get the name of the analysis.
- Parameters
gname (str) – Group name.
target (str) – Target of the analysis.
- Returns
name – Value for the corresponding key.
- Return type
str
-
alphapy.analysis.
run_analysis
(analysis, lag_period, forecast_period, leaders, predict_history, splits=True)¶ Run an analysis for a given model and group.
First, the data are loaded for each member of the analysis group. Then, the target value is lagged for the
forecast_period
, and anyleaders
are lagged as well. Each frame is split along thepredict_date
from theanalysis
, and finally the train and test files are generated.- Parameters
analysis (alphapy.Analysis) – The analysis to run.
lag_period (int) – The number of lagged features for the analysis.
forecast_period (int) – The period for forecasting the target of the analysis.
leaders (list) – The features that are contemporaneous with the target.
predict_history (int) – The number of periods required for lookback calculations.
splits (bool, optional) – If
True
, then the data for each member of the analysis group are in separate files.
- Returns
analysis – The completed analysis.
- Return type
alphapy.Analysis
alphapy.calendrical module¶
-
alphapy.calendrical.
biz_day_month
(rdate)¶ Calculate the business day of the month.
- Parameters
rdate (int) – RDate date format.
- Returns
bdm – Business day of month.
- Return type
int
-
alphapy.calendrical.
biz_day_week
(rdate)¶ Calculate the business day of the week.
- Parameters
rdate (int) – RDate date format.
- Returns
bdw – Business day of week.
- Return type
int
-
alphapy.calendrical.
christmas_day
(gyear, observed)¶ Get Christmas Day for a given year.
- Parameters
gyear (int) – Gregorian year.
observed (bool) – False if the exact date, True if the weekday.
- Returns
xmas – Christmas Day in RDate format.
- Return type
int
-
alphapy.calendrical.
cinco_de_mayo
(gyear)¶ Get Cinco de Mayo for a given year.
- Parameters
gyear (int) – Gregorian year.
- Returns
cinco_de_mayo – Cinco de Mayo in RDate format.
- Return type
int
-
alphapy.calendrical.
day_of_week
(rdate)¶ Get the ordinal day of the week.
- Parameters
rdate (int) – RDate date format.
- Returns
dw – Ordinal day of the week.
- Return type
int
-
alphapy.calendrical.
day_of_year
(gyear, gmonth, gday)¶ Calculate the day number of the given calendar year.
- Parameters
gyear (int) – Gregorian year.
gmonth (int) – Gregorian month.
gday (int) – Gregorian day.
- Returns
dy – Day number of year in RDate format.
- Return type
int
-
alphapy.calendrical.
days_left_in_year
(gyear, gmonth, gday)¶ Calculate the number of days remaining in the calendar year.
- Parameters
gyear (int) – Gregorian year.
gmonth (int) – Gregorian month.
gday (int) – Gregorian day.
- Returns
days_left – Calendar days remaining in RDate format.
- Return type
int
-
alphapy.calendrical.
easter_day
(gyear)¶ Get Easter Day for a given year.
- Parameters
gyear (int) – Gregorian year.
- Returns
ed – Easter Day in RDate format.
- Return type
int
-
alphapy.calendrical.
expand_dates
(date_list)¶
-
alphapy.calendrical.
fathers_day
(gyear)¶ Get Father’s Day for a given year.
- Parameters
gyear (int) – Gregorian year.
- Returns
fathers_day – Father’s Day in RDate format.
- Return type
int
-
alphapy.calendrical.
first_kday
(k, gyear, gmonth, gday)¶ Calculate the first kday in RDate format.
- Parameters
k (int) – Day of the week.
gyear (int) – Gregorian year.
gmonth (int) – Gregorian month.
gday (int) – Gregorian day.
- Returns
fkd – first-kday in RDate format.
- Return type
int
-
alphapy.calendrical.
gdate_to_rdate
(gyear, gmonth, gday)¶ Convert Gregorian date to RDate format.
- Parameters
gyear (int) – Gregorian year.
gmonth (int) – Gregorian month.
gday (int) – Gregorian day.
- Returns
rdate – RDate date format.
- Return type
int
-
alphapy.calendrical.
get_holiday_names
()¶ Get the list of defined holidays.
- Returns
holidays – List of holiday names.
- Return type
list of str
-
alphapy.calendrical.
get_nth_kday_of_month
(gday, gmonth, gyear)¶ Convert Gregorian date to RDate format.
- Parameters
gday (int) – Gregorian day.
gmonth (int) – Gregorian month.
gyear (int) – Gregorian year.
- Returns
nth – Ordinal number of a given day’s occurrence within the month, for example, the third Friday of the month.
- Return type
int
-
alphapy.calendrical.
get_rdate
(row)¶ Extract RDate from a dataframe.
- Parameters
row (pandas.DataFrame) – Row of a dataframe containing year, month, and day.
- Returns
rdate – RDate date format.
- Return type
int
-
alphapy.calendrical.
good_friday
(gyear)¶ Get Good Friday for a given year.
- Parameters
gyear (int) – Gregorian year.
- Returns
gf – Good Friday in RDate format.
- Return type
int
-
alphapy.calendrical.
halloween
(gyear)¶ Get Halloween for a given year.
- Parameters
gyear (int) – Gregorian year.
- Returns
halloween – Halloween in RDate format.
- Return type
int
-
alphapy.calendrical.
independence_day
(gyear, observed)¶ Get Independence Day for a given year.
- Parameters
gyear (int) – Gregorian year.
observed (bool) – False if the exact date, True if the weekday.
- Returns
d4j – Independence Day in RDate format.
- Return type
int
-
alphapy.calendrical.
kday_after
(rdate, k)¶ Calculate the day after a given RDate.
- Parameters
rdate (int) – RDate date format.
k (int) – Day of the week.
- Returns
kda – kday-after in RDate format.
- Return type
int
-
alphapy.calendrical.
kday_before
(rdate, k)¶ Calculate the day before a given RDate.
- Parameters
rdate (int) – RDate date format.
k (int) – Day of the week.
- Returns
kdb – kday-before in RDate format.
- Return type
int
-
alphapy.calendrical.
kday_nearest
(rdate, k)¶ Calculate the day nearest a given RDate.
- Parameters
rdate (int) – RDate date format.
k (int) – Day of the week.
- Returns
kdn – kday-nearest in RDate format.
- Return type
int
-
alphapy.calendrical.
kday_on_after
(rdate, k)¶ Calculate the day on or after a given RDate.
- Parameters
rdate (int) – RDate date format.
k (int) – Day of the week.
- Returns
kdoa – kday-on-or-after in RDate format.
- Return type
int
-
alphapy.calendrical.
kday_on_before
(rdate, k)¶ Calculate the day on or before a given RDate.
- Parameters
rdate (int) – RDate date format.
k (int) – Day of the week.
- Returns
kdob – kday-on-or-before in RDate format.
- Return type
int
-
alphapy.calendrical.
labor_day
(gyear)¶ Get Labor Day for a given year.
- Parameters
gyear (int) – Gregorian year.
- Returns
lday – Labor Day in RDate format.
- Return type
int
-
alphapy.calendrical.
last_kday
(k, gyear, gmonth, gday)¶ Calculate the last kday in RDate format.
- Parameters
k (int) – Day of the week.
gyear (int) – Gregorian year.
gmonth (int) – Gregorian month.
gday (int) – Gregorian day.
- Returns
lkd – last-kday in RDate format.
- Return type
int
-
alphapy.calendrical.
leap_year
(gyear)¶ Determine if this is a Gregorian leap year.
- Parameters
gyear (int) – Gregorian year.
- Returns
leap_year – True if a Gregorian leap year, else False.
- Return type
bool
-
alphapy.calendrical.
memorial_day
(gyear)¶ Get Memorial Day for a given year.
- Parameters
gyear (int) – Gregorian year.
- Returns
md – Memorial Day in RDate format.
- Return type
int
-
alphapy.calendrical.
mlk_day
(gyear)¶ Get Martin Luther King Day for a given year.
- Parameters
gyear (int) – Gregorian year.
- Returns
mlkday – Martin Luther King Day in RDate format.
- Return type
int
-
alphapy.calendrical.
mothers_day
(gyear)¶ Get Mother’s Day for a given year.
- Parameters
gyear (int) – Gregorian year.
- Returns
mothers_day – Mother’s Day in RDate format.
- Return type
int
-
alphapy.calendrical.
new_years_day
(gyear, observed)¶ Get New Year’s day for a given year.
- Parameters
gyear (int) – Gregorian year.
observed (bool) – False if the exact date, True if the weekday.
- Returns
nyday – New Year’s Day in RDate format.
- Return type
int
-
alphapy.calendrical.
next_event
(rdate, events)¶ Find the next event after a given date.
- Parameters
rdate (int) – RDate date format.
events (list of RDate (int)) – Monthly events in RDate format.
- Returns
event – Next event in RDate format.
- Return type
RDate (int)
-
alphapy.calendrical.
next_holiday
(rdate, holidays)¶ Find the next holiday after a given date.
- Parameters
rdate (int) – RDate date format.
holidays (dict of RDate (int)) – Holidays in RDate format.
- Returns
holiday – Next holiday in RDate format.
- Return type
RDate (int)
-
alphapy.calendrical.
nth_bizday
(n, gyear, gmonth)¶ Calculate the nth business day in a month.
- Parameters
n (int) – Number of the business day to get.
gyear (int) – Gregorian year.
gmonth (int) – Gregorian month.
- Returns
bizday – Nth business day of a given month in RDate format.
- Return type
int
-
alphapy.calendrical.
nth_kday
(n, k, gyear, gmonth, gday)¶ Calculate the nth-kday in RDate format.
- Parameters
n (int) – Occurrence of a given day counting in either direction.
k (int) – Day of the week.
gyear (int) – Gregorian year.
gmonth (int) – Gregorian month.
gday (int) – Gregorian day.
- Returns
nthkday – nth-kday in RDate format.
- Return type
int
-
alphapy.calendrical.
presidents_day
(gyear)¶ Get President’s Day for a given year.
- Parameters
gyear (int) – Gregorian year.
- Returns
prezday – President’s Day in RDate format.
- Return type
int
-
alphapy.calendrical.
previous_event
(rdate, events)¶ Find the previous event before a given date.
- Parameters
rdate (int) – RDate date format.
events (list of RDate (int)) – Monthly events in RDate format.
- Returns
event – Previous event in RDate format.
- Return type
RDate (int)
-
alphapy.calendrical.
previous_holiday
(rdate, holidays)¶ Find the previous holiday before a given date.
- Parameters
rdate (int) – RDate date format.
holidays (dict of RDate (int)) – Holidays in RDate format.
- Returns
holiday – Previous holiday in RDate format.
- Return type
RDate (int)
-
alphapy.calendrical.
rdate_to_gdate
(rdate)¶ Convert RDate format to Gregorian date format.
- Parameters
rdate (int) – RDate date format.
- Returns
gyear (int) – Gregorian year.
gmonth (int) – Gregorian month.
gday (int) – Gregorian day.
-
alphapy.calendrical.
rdate_to_gyear
(rdate)¶ Convert RDate format to Gregorian year.
- Parameters
rdate (int) – RDate date format.
- Returns
gyear – Gregorian year.
- Return type
int
-
alphapy.calendrical.
saint_patricks_day
(gyear)¶ Get Saint Patrick’s day for a given year.
- Parameters
gyear (int) – Gregorian year.
observed (bool) – False if the exact date, True if the weekday.
- Returns
patricks – Saint Patrick’s Day in RDate format.
- Return type
int
-
alphapy.calendrical.
set_events
(n, k, gyear, gday)¶ Define monthly events for a given year.
- Parameters
n (int) – Occurrence of a given day counting in either direction.
k (int) – Day of the week.
gyear (int) – Gregorian year for the events.
gday (int) – Gregorian day representing the first day to consider.
- Returns
events – Monthly events in RDate format.
- Return type
list of RDate (int)
Example
>>> # Options Expiration (Third Friday of every month) >>> set_events(3, 5, 2017, 1)
-
alphapy.calendrical.
set_holidays
(gyear, observe)¶ Determine if this is a Gregorian leap year.
- Parameters
gyear (int) – Value for the corresponding key.
observe (bool) – True to get the observed date, otherwise False.
- Returns
holidays – Set of holidays in RDate format for a given year.
- Return type
dict of int
-
alphapy.calendrical.
subtract_dates
(gyear1, gmonth1, gday1, gyear2, gmonth2, gday2)¶ Calculate the difference between two Gregorian dates.
- Parameters
gyear1 (int) – Gregorian year of first date.
gmonth1 (int) – Gregorian month of first date.
gday1 (int) – Gregorian day of first date.
gyear2 (int) – Gregorian year of successive date.
gmonth2 (int) – Gregorian month of successive date.
gday2 (int) – Gregorian day of successive date.
- Returns
delta_days – Difference in days in RDate format.
- Return type
int
-
alphapy.calendrical.
thanksgiving_day
(gyear)¶ Get Thanksgiving Day for a given year.
- Parameters
gyear (int) – Gregorian year.
- Returns
tday – Thanksgiving Day in RDate format.
- Return type
int
-
alphapy.calendrical.
valentines_day
(gyear)¶ Get Valentine’s day for a given year.
- Parameters
gyear (int) – Gregorian year.
- Returns
valentines – Valentine’s Day in RDate format.
- Return type
int
-
alphapy.calendrical.
veterans_day
(gyear, observed)¶ Get Veteran’s day for a given year.
- Parameters
gyear (int) – Gregorian year.
observed (bool) – False if the exact date, True if the weekday.
- Returns
veterans – Veteran’s Day in RDate format.
- Return type
int
alphapy.data module¶
-
alphapy.data.
convert_data
(df, index_column, intraday_data)¶ Convert the market data frame to canonical format.
- Parameters
df (pandas.DataFrame) – The intraday dataframe.
index_column (str) – The name of the index column.
intraday_data (bool) – Flag set to True if the frame contains intraday data.
- Returns
df – The canonical dataframe with date/time index.
- Return type
pandas.DataFrame
-
alphapy.data.
enhance_intraday_data
(df)¶ Add columns to the intraday dataframe.
- Parameters
df (pandas.DataFrame) – The intraday dataframe.
- Returns
df – The dataframe with bar number and end-of-day columns.
- Return type
pandas.DataFrame
-
alphapy.data.
get_data
(model, partition)¶ Get data for the given partition.
- Parameters
model (alphapy.Model) – The model object describing the data.
partition (alphapy.Partition) – Reference to the dataset.
- Returns
X (pandas.DataFrame) – The feature set.
y (pandas.Series) – The array of target values, if available.
-
alphapy.data.
get_google_data
(schema, subschema, symbol, intraday_data, data_fractal, from_date, to_date, lookback_period)¶ Get data from Google.
- Parameters
schema (str) – The schema (including any subschema) for this data feed.
subschema (str) – Any subschema for this data feed.
symbol (str) – A valid stock symbol.
intraday_data (bool) – If True, then get intraday data.
data_fractal (str) – Pandas offset alias.
from_date (str) – Starting date for symbol retrieval.
to_date (str) – Ending date for symbol retrieval.
lookback_period (int) – The number of periods of data to retrieve.
- Returns
df – The dataframe containing the market data.
- Return type
pandas.DataFrame
-
alphapy.data.
get_google_intraday_data
(symbol, lookback_period, fractal)¶ Get Google Finance intraday data.
We get intraday data from the Google Finance API, even though it is not officially supported. You can retrieve a maximum of 50 days of history, so you may want to build your own database for more extensive backtesting.
- Parameters
symbol (str) – A valid stock symbol.
lookback_period (int) – The number of days of intraday data to retrieve, capped at 50.
fractal (str) – The intraday frequency, e.g., “5m” for 5-minute data.
- Returns
df – The dataframe containing the intraday data.
- Return type
pandas.DataFrame
-
alphapy.data.
get_iex_data
(schema, subschema, symbol, intraday_data, data_fractal, from_date, to_date, lookback_period)¶ Get data from IEX.
- Parameters
schema (str) – The schema (including any subschema) for this data feed.
subschema (str) – Any subschema for this data feed.
symbol (str) – A valid stock symbol.
intraday_data (bool) – If True, then get intraday data.
data_fractal (str) – Pandas offset alias.
from_date (str) – Starting date for symbol retrieval.
to_date (str) – Ending date for symbol retrieval.
lookback_period (int) – The number of periods of data to retrieve.
- Returns
df – The dataframe containing the market data.
- Return type
pandas.DataFrame
-
alphapy.data.
get_market_data
(model, market_specs, group, lookback_period, intraday_data=False)¶ Get data from an external feed.
- Parameters
model (alphapy.Model) – The model object describing the data.
market_specs (dict) – The specifications for controlling the MarketFlow pipeline.
group (alphapy.Group) – The group of symbols.
lookback_period (int) – The number of periods of data to retrieve.
intraday_data (bool) – If True, then get intraday data.
- Returns
n_periods – The maximum number of periods actually retrieved.
- Return type
int
-
alphapy.data.
get_pandas_data
(schema, subschema, symbol, intraday_data, data_fractal, from_date, to_date, lookback_period)¶ Get Pandas Web Reader data.
- Parameters
schema (str) – The schema (including any subschema) for this data feed.
subschema (str) – Any subschema for this data feed.
symbol (str) – A valid stock symbol.
intraday_data (bool) – If True, then get intraday data.
data_fractal (str) – Pandas offset alias.
from_date (str) – Starting date for symbol retrieval.
to_date (str) – Ending date for symbol retrieval.
lookback_period (int) – The number of periods of data to retrieve.
- Returns
df – The dataframe containing the market data.
- Return type
pandas.DataFrame
-
alphapy.data.
get_quandl_data
(schema, subschema, symbol, intraday_data, data_fractal, from_date, to_date, lookback_period)¶ Get Quandl data.
- Parameters
schema (str) – The schema for this data feed.
subschema (str) – Any subschema for this data feed.
symbol (str) – A valid stock symbol.
intraday_data (bool) – If True, then get intraday data.
data_fractal (str) – Pandas offset alias.
from_date (str) – Starting date for symbol retrieval.
to_date (str) – Ending date for symbol retrieval.
lookback_period (int) – The number of periods of data to retrieve.
- Returns
df – The dataframe containing the market data.
- Return type
pandas.DataFrame
-
alphapy.data.
get_yahoo_data
(schema, subschema, symbol, intraday_data, data_fractal, from_date, to_date, lookback_period)¶ Get Yahoo data.
- Parameters
schema (str) – The schema (including any subschema) for this data feed.
subschema (str) – Any subschema for this data feed.
symbol (str) – A valid stock symbol.
intraday_data (bool) – If True, then get intraday data.
data_fractal (str) – Pandas offset alias.
from_date (str) – Starting date for symbol retrieval.
to_date (str) – Ending date for symbol retrieval.
lookback_period (int) – The number of periods of data to retrieve.
- Returns
df – The dataframe containing the market data.
- Return type
pandas.DataFrame
-
alphapy.data.
sample_data
(model)¶ Sample the training data.
Sampling is configured in the
model.yml
file (data:sampling:method) You can learn more about resampling techniques here [IMB].- Parameters
model (alphapy.Model) – The model object describing the data.
- Returns
model – The model object with the sampled data.
- Return type
alphapy.Model
-
alphapy.data.
shuffle_data
(model)¶ Randomly shuffle the training data.
- Parameters
model (alphapy.Model) – The model object describing the data.
- Returns
model – The model object with the shuffled data.
- Return type
alphapy.Model
alphapy.estimators module¶
-
class
alphapy.estimators.
Estimator
(algorithm, model_type, estimator, grid)¶ Bases:
object
Store information about each estimator.
- Parameters
algorithm (str) – Abbreviation representing the given algorithm.
model_type (enum ModelType) – The machine learning task for this algorithm.
estimator (function) – A scikit-learn, TensorFlow, or XGBoost function.
grid (dict) – The dictionary of hyperparameters for grid search.
-
alphapy.estimators.
create_keras_model
(nlayers, layer1=None, layer2=None, layer3=None, layer4=None, layer5=None, layer6=None, layer7=None, layer8=None, layer9=None, layer10=None, optimizer=None, loss=None, metrics=None)¶ Create a Keras Sequential model.
- Parameters
nlayers (int) – Number of layers of the Sequential model.
layer1…layer10 (str) – Ordered layers of the Sequential model.
optimizer (str) – Compiler optimizer for the Sequential model.
loss (str) – Compiler loss function for the Sequential model.
metrics (str) – Compiler evaluation metric for the Sequential model.
- Returns
model – Compiled Keras Sequential Model.
- Return type
keras.models.Sequential
-
alphapy.estimators.
find_optional_packages
()¶
-
alphapy.estimators.
get_algos_config
(cfg_dir)¶ Read the algorithms configuration file.
- Parameters
cfg_dir (str) – The directory where the configuration file
algos.yml
is stored.- Returns
specs – The specifications for determining which algorithms to run.
- Return type
dict
-
alphapy.estimators.
get_estimators
(model)¶ Define all the AlphaPy estimators based on the contents of the
algos.yml
file.- Parameters
model (alphapy.Model) – The model object containing global AlphaPy parameters.
- Returns
estimators – All of the estimators required for running the pipeline.
- Return type
dict
alphapy.features module¶
-
alphapy.features.
apply_transform
(fname, df, fparams)¶ Apply a transform function to a column of the dataframe.
- Parameters
fname (str) – Name of the column to be treated in the dataframe
df
.df (pandas.DataFrame) – Dataframe containing the column
fname
.fparams (list) – The module, function, and parameter list of the transform function
- Returns
new_features – The set of features after applying a transform function.
- Return type
pandas.DataFrame
-
alphapy.features.
apply_transforms
(model, X)¶ Apply special functions to the original features.
- Parameters
model (alphapy.Model) – Model specifications indicating any transforms.
X (pandas.DataFrame) – Combined train and test data, or just prediction data.
- Returns
all_features – All features, including transforms.
- Return type
pandas.DataFrame
- Raises
IndexError – The number of transform rows must match the number of rows in
X
.
-
alphapy.features.
create_clusters
(features, model)¶ Cluster the given features.
- Parameters
features (numpy array) – The features to cluster.
model (alphapy.Model) – The model object with the clustering parameters.
- Returns
cfeatures (numpy array) – The calculated clusters.
cnames (list) – The cluster feature names.
References
You can find more information on clustering here [CLUS].
-
alphapy.features.
create_crosstabs
(model)¶ Create cross-tabulations for categorical variables.
- Parameters
model (alphapy.Model) – The model object containing the data.
- Returns
model – The model object with the updated feature map.
- Return type
alphapy.Model
-
alphapy.features.
create_features
(model, X, X_train, X_test, y_train)¶ Create features for the train and test set.
- Parameters
model (alphapy.Model) – Model object with the feature specifications.
X (pandas.DataFrame) – Combined train and test data.
X_train (pandas.DataFrame) – Training data.
X_test (pandas.DataFrame) – Testing data.
y_train (pandas.DataFrame) – Target variable for training data.
- Returns
all_features – The new features.
- Return type
numpy array
- Raises
TypeError – Unrecognized data type.
-
alphapy.features.
create_interactions
(model, X)¶ Create feature interactions based on the model specifications.
- Parameters
model (alphapy.Model) – Model object with train and test data.
X (numpy array) – Feature Matrix.
- Returns
all_features – The new interaction features.
- Return type
numpy array
- Raises
TypeError – Unknown model type when creating interactions.
-
alphapy.features.
create_isomap_features
(features, model)¶ Create Isomap features.
- Parameters
features (numpy array) – The input features.
model (alphapy.Model) – The model object with the Isomap parameters.
- Returns
ifeatures (numpy array) – The Isomap features.
inames (list) – The Isomap feature names.
Notes
Isomaps are very memory-intensive. Your process will be killed if you run out of memory.
References
You can find more information on Principal Component Analysis here [ISO].
-
alphapy.features.
create_numpy_features
(base_features, sentinel)¶ Calculate the sum, mean, standard deviation, and variance of each row.
- Parameters
base_features (numpy array) – The feature dataframe.
sentinel (float) – The number to be imputed for NaN values.
- Returns
np_features (numpy array) – The calculated NumPy features.
np_fnames (list) – The NumPy feature names.
-
alphapy.features.
create_pca_features
(features, model)¶ Apply Principal Component Analysis (PCA) to the features.
- Parameters
features (numpy array) – The input features.
model (alphapy.Model) – The model object with the PCA parameters.
- Returns
pfeatures (numpy array) – The PCA features.
pnames (list) – The PCA feature names.
References
You can find more information on Principal Component Analysis here [PCA].
-
alphapy.features.
create_scipy_features
(base_features, sentinel)¶ Calculate the skew, kurtosis, and other statistical features for each row.
- Parameters
base_features (numpy array) – The feature dataframe.
sentinel (float) – The number to be imputed for NaN values.
- Returns
sp_features (numpy array) – The calculated SciPy features.
sp_fnames (list) – The SciPy feature names.
-
alphapy.features.
create_tsne_features
(features, model)¶ Create t-SNE features.
- Parameters
features (numpy array) – The input features.
model (alphapy.Model) – The model object with the t-SNE parameters.
- Returns
tfeatures (numpy array) – The t-SNE features.
tnames (list) – The t-SNE feature names.
References
You can find more information on the t-SNE technique here [TSNE].
-
alphapy.features.
drop_features
(X, drop)¶ Drop any specified features.
- Parameters
X (pandas.DataFrame) – The dataframe containing the features.
drop (list) – The list of features to remove from
X
.
- Returns
X – The dataframe without the dropped features.
- Return type
pandas.DataFrame
-
alphapy.features.
float_factor
(x, rounding)¶ Convert a floating point number to a factor.
- Parameters
x (float) – The value to convert to a factor.
rounding (int) – The number of places to round.
- Returns
ffactor – The resulting factor.
- Return type
int
-
alphapy.features.
get_factors
(model, X_train, X_test, y_train, fnum, fname, nvalues, dtype, encoder, rounding, sentinel)¶ Convert the original feature to a factor.
- Parameters
model (alphapy.Model) – Model object with the feature specifications.
X_train (pandas.DataFrame) – Training dataframe containing the column
fname
.X_test (pandas.DataFrame) – Testing dataframe containing the column
fname
.y_train (pandas.Series) – Training series for target variable.
fnum (int) – Feature number, strictly for logging purposes
fname (str) – Name of the text column in the dataframe
df
.nvalues (int) – The number of unique values.
dtype (str) – The values
'float64'
,'int64'
, or'bool'
.encoder (alphapy.features.Encoders) – Type of encoder to apply.
rounding (int) – Number of places to round.
sentinel (float) – The number to be imputed for NaN values.
- Returns
all_features (numpy array) – The features that have been transformed to factors.
all_fnames (list) – The feature names for the encodings.
-
alphapy.features.
get_numerical_features
(fnum, fname, df, nvalues, dt, sentinel, logt, plevel)¶ Transform numerical features with imputation and possibly log-transformation.
- Parameters
fnum (int) – Feature number, strictly for logging purposes
fname (str) – Name of the numerical column in the dataframe
df
.df (pandas.DataFrame) – Dataframe containing the column
fname
.nvalues (int) – The number of unique values.
dt (str) – The values
'float64'
,'int64'
, or'bool'
.sentinel (float) – The number to be imputed for NaN values.
logt (bool) – If
True
, then log-transform numerical values.plevel (float) – The p-value threshold to test if a feature is normally distributed.
- Returns
new_values (numpy array) – The set of imputed and transformed features.
new_fnames (list) – The new feature name(s) for the numerical variable.
-
alphapy.features.
get_polynomials
(features, poly_degree)¶ Generate interactions that are products of distinct features.
- Parameters
features (pandas.DataFrame) – Dataframe containing the features for generating interactions.
poly_degree (int) – The degree of the polynomial features.
- Returns
poly_features (numpy array) – The interaction features only.
poly_fnames (list) – List of polynomial feature names.
References
You can find more information on polynomial interactions here [POLY].
-
alphapy.features.
get_text_features
(fnum, fname, df, nvalues, vectorize, ngrams_max)¶ Transform text features with count vectorization and TF-IDF, or alternatively factorization.
- Parameters
fnum (int) – Feature number, strictly for logging purposes
fname (str) – Name of the text column in the dataframe
df
.df (pandas.DataFrame) – Dataframe containing the column
fname
.nvalues (int) – The number of unique values.
vectorize (bool) – If
True
, then attempt count vectorization.ngrams_max (int) – The maximum number of n-grams for count vectorization.
- Returns
new_features (numpy array) – The vectorized or factorized text features.
new_fnames (list) – The new feature name(s) for the numerical variable.
References
To use count vectorization and TF-IDF, you can find more information here [TFE].
-
alphapy.features.
impute_values
(feature, dt, sentinel)¶ Impute values for a given data type. The median strategy is applied for floating point values, and the most frequent strategy is applied for integer or Boolean values.
- Parameters
feature (pandas.Series or numpy.array) – The feature for imputation.
dt (str) – The values
'float64'
,'int64'
, or'bool'
.sentinel (float) – The number to be imputed for NaN values.
- Returns
imputed – The feature after imputation.
- Return type
numpy.array
- Raises
TypeError – Data type
dt
is invalid for imputation.
References
You can find more information on feature imputation here [IMP].
-
alphapy.features.
remove_lv_features
(model, X)¶ Remove low-variance features.
- Parameters
model (alphapy.Model) – Model specifications for removing features.
X (numpy array) – The feature matrix.
- Returns
X_reduced – The reduced feature matrix.
- Return type
numpy array
References
You can find more information on low-variance feature selection here [LV].
-
alphapy.features.
save_features
(model, X_train, X_test, y_train=None, y_test=None)¶ Save new features to the model.
- Parameters
model (alphapy.Model) – Model object with train and test data.
X_train (numpy array) – Training features.
X_test (numpy array) – Testing features.
y_train (numpy array) – Training labels.
y_test (numpy array) – Testing labels.
- Returns
model – Model object with new train and test data.
- Return type
alphapy.Model
-
alphapy.features.
select_features
(model)¶ Select features with univariate selection.
- Parameters
model (alphapy.Model) – Model object with the feature selection specifications.
- Returns
model – Model object with the revised number of features.
- Return type
alphapy.Model
References
You can find more information on univariate feature selection here [UNI].
alphapy.frame module¶
-
class
alphapy.frame.
Frame
(name, space, df)¶ Bases:
object
Create a new Frame that points to a dataframe in memory. All frames are stored in
Frame.frames
. Names must be unique.- Parameters
name (str) – Frame key.
space (alphapy.Space) – Namespace of the given frame.
df (pandas.DataFrame) – The contents of the actual dataframe.
- Variables
frames (dict) – Class variable for storing all known frames
Examples
>>> Frame('tech', Space('stock', 'prices', '5m'), df)
-
frames
= {}¶
-
alphapy.frame.
dump_frames
(group, directory, extension, separator)¶ Save a group of data frames to disk.
- Parameters
group (alphapy.Group) – The collection of frames to be saved to the file system.
directory (str) – Full directory specification.
extension (str) – File name extension, e.g.,
csv
.separator (str) – The delimiter between fields in the file.
- Returns
None
- Return type
None
-
alphapy.frame.
frame_name
(name, space)¶ Get the frame name for the given name and space.
- Parameters
name (str) – Group name.
space (alphapy.Space) – Context or namespace for the given group name.
- Returns
fname – Frame name.
- Return type
str
Examples
>>> fname = frame_name('tech', Space('stock', 'prices', '1d')) # 'tech_stock_prices_1d'
-
alphapy.frame.
load_frames
(group, directory, extension, separator, splits=False)¶ Read a group of dataframes into memory.
- Parameters
group (alphapy.Group) – The collection of frames to be read into memory.
directory (str) – Full directory specification.
extension (str) – File name extension, e.g.,
csv
.separator (str) – The delimiter between fields in the file.
splits (bool, optional) – If
True
, then all the members of the group are stored in separate files corresponding with each member. IfFalse
, then the data are stored in a single file.
- Returns
all_frames – The list of pandas dataframes loaded from the file location. If the files cannot be located, then
None
is returned.- Return type
list
-
alphapy.frame.
read_frame
(directory, filename, extension, separator, index_col=None, squeeze=False)¶ Read a delimiter-separated file into a data frame.
- Parameters
directory (str) – Full directory specification.
filename (str) – Name of the file to read, excluding the
extension
.extension (str) – File name extension, e.g.,
csv
.separator (str) – The delimiter between fields in the file.
index_col (str, optional) – Column to use as the row labels in the dataframe.
squeeze (bool, optional) – If the data contains only one column, then return a pandas Series.
- Returns
df – The pandas dataframe loaded from the file location. If the file cannot be located, then
None
is returned.- Return type
pandas.DataFrame
-
alphapy.frame.
sequence_frame
(df, target, forecast_period=1, leaders=[], lag_period=1)¶ Create sequences of lagging and leading values.
- Parameters
df (pandas.DataFrame) – The original dataframe.
target (str) – The target variable for prediction.
forecast_period (int) – The period for forecasting the target of the analysis.
leaders (list) – The features that are contemporaneous with the target.
lag_period (int) – The number of lagged rows for prediction.
- Returns
new_frame – The transformed dataframe with variable sequences.
- Return type
pandas.DataFrame
-
alphapy.frame.
write_frame
(df, directory, filename, extension, separator, index=False, index_label=None, columns=None)¶ Write a dataframe into a delimiter-separated file.
- Parameters
df (pandas.DataFrame) – The pandas dataframe to save to a file.
directory (str) – Full directory specification.
filename (str) – Name of the file to write, excluding the
extension
.extension (str) – File name extension, e.g.,
csv
.separator (str) – The delimiter between fields in the file.
index (bool, optional) – If
True
, write the row names (index).index_label (str, optional) – A column label for the
index
.columns (str, optional) – A list of column names.
- Returns
None
- Return type
None
alphapy.globals module¶
-
class
alphapy.globals.
Encoders
(value)¶ Bases:
enum.Enum
AlphaPy Encoders.
These are the encoders used in AlphaPy, as configured in the
model.yml
file (features:encoding:type) You can learn more about encoders here [ENC].-
backdiff
= 1¶
-
basen
= 2¶
-
binary
= 3¶
-
catboost
= 4¶
-
hashing
= 5¶
-
helmert
= 6¶
-
jstein
= 7¶
-
leaveone
= 8¶
-
mestimate
= 9¶
-
onehot
= 10¶
-
ordinal
= 11¶
-
polynomial
= 12¶
-
sum
= 13¶
-
target
= 14¶
-
woe
= 15¶
-
-
class
alphapy.globals.
ModelType
(value)¶ Bases:
enum.Enum
AlphaPy Model Types.
Note
One-Class Classification
oneclass
is not yet implemented.-
classification
= 1¶
-
clustering
= 2¶
-
multiclass
= 3¶
-
oneclass
= 4¶
-
regression
= 5¶
-
-
class
alphapy.globals.
Objective
(value)¶ Bases:
enum.Enum
Scoring Function Objectives.
Best model selection is based on the scoring or Objective function, which must be either maximized or minimized. For example,
roc_auc
is maximized, whileneg_log_loss
is minimized.-
maximize
= 1¶
-
minimize
= 2¶
-
-
class
alphapy.globals.
Orders
¶ Bases:
object
System Order Types.
- Variables
-
le
= 'le'¶
-
lh
= 'lh'¶
-
lx
= 'lx'¶
-
se
= 'se'¶
-
sh
= 'sh'¶
-
sx
= 'sx'¶
-
class
alphapy.globals.
Partition
(value)¶ Bases:
enum.Enum
AlphaPy Partitions.
-
predict
= 1¶
-
test
= 2¶
-
train
= 3¶
-
-
class
alphapy.globals.
SamplingMethod
(value)¶ Bases:
enum.Enum
AlphaPy Sampling Methods.
These are the data sampling methods used in AlphaPy, as configured in the
model.yml
file (data:sampling:method) You can learn more about resampling techniques here [IMB].-
ensemble_bc
= 1¶
-
ensemble_easy
= 2¶
-
over_random
= 3¶
-
over_smote
= 4¶
-
over_smoteb
= 5¶
-
over_smotesv
= 6¶
-
overunder_smote_enn
= 7¶
-
overunder_smote_tomek
= 8¶
-
under_cluster
= 9¶
-
under_ncr
= 10¶
-
under_nearmiss
= 11¶
-
under_random
= 12¶
-
under_tomek
= 13¶
-
alphapy.group module¶
-
class
alphapy.group.
Group
(name, space=<alphapy.space.Space object>, dynamic=True, members={})¶ Bases:
object
Create a new Group that contains common members. All defined groups are stored in
Group.groups
. Group names must be unique.- Parameters
name (str) – Group name.
space (alphapy.Space, optional) – Namespace for the given group.
dynamic (bool, optional, default
True
) – Flag for defining whether or not the group membership can change.members (set, optional) – The initial members of the group, especially if the new group is fixed, e.g., not
dynamic
.
- Variables
groups (dict) – Class variable for storing all known groups
Examples
>>> Group('tech')
-
add
(newlist)¶ Add new members to the group.
- Parameters
newlist (list) – New members or identifiers to add to the group.
- Returns
None
- Return type
None
Notes
New members cannot be added to a fixed or non-dynamic group.
-
groups
= {}¶
-
member
(item)¶ Find a member in the group.
- Parameters
item (str) – The member to find the group.
- Returns
member_exists – Flag indicating whether or not the member is in the group.
- Return type
bool
-
remove
(remlist)¶ Read in data from the given directory in a given format.
- Parameters
remlist (list) – The list of members to remove from the group.
- Returns
None
- Return type
None
Notes
Members cannot be removed from a fixed or non-dynamic group.
alphapy.market_flow module¶
-
alphapy.market_flow.
get_market_config
()¶ Read the configuration file for MarketFlow.
- Parameters
None (None)
- Returns
specs – The parameters for controlling MarketFlow.
- Return type
dict
-
alphapy.market_flow.
main
(args=None)¶ MarketFlow Main Program
Notes
Initialize logging.
Parse the command line arguments.
Get the market configuration.
Get the model configuration.
Create the model object.
Call the main MarketFlow pipeline.
- Raises
ValueError – Training date must be before prediction date.
-
alphapy.market_flow.
market_pipeline
(model, market_specs)¶ AlphaPy MarketFlow Pipeline
- Parameters
model (alphapy.Model) – The model object for AlphaPy.
market_specs (dict) – The specifications for controlling the MarketFlow pipeline.
- Returns
model – The final results are stored in the model object.
- Return type
alphapy.Model
Notes
Define a group.
Get the market data.
Apply system features.
Create an analysis.
Run the analysis, which calls AlphaPy.
alphapy.model module¶
-
class
alphapy.model.
Model
(specs)¶ Bases:
object
Create a new model.
- Parameters
specs (dict) – The model specifications obtained by reading the
model.yml
file.- Variables
specs (dict) – The model specifications.
X_train (pandas.DataFrame) – Training features in matrix format.
X_test (pandas.Series) – Testing features in matrix format.
y_train (pandas.DataFrame) – Training labels in vector format.
y_test (pandas.Series) – Testing labels in vector format.
algolist (list) – Algorithms to use in training.
estimators (dict) – Dictionary of estimators (key: algorithm)
importances (dict) – Feature Importances (key: algorithm)
coefs (dict) – Coefficients, if applicable (key: algorithm)
support (dict) – Support Vectors, if applicable (key: algorithm)
preds (dict) – Predictions or labels (keys: algorithm, partition)
probas (dict) – Probabilities from classification (keys: algorithm, partition)
metrics (dict) – Model evaluation metrics (keys: algorith, partition, metric)
- Raises
KeyError – Model specs must include the key algorithms, which is stored in
algolist
.
-
alphapy.model.
first_fit
(model, algo, est)¶ Fit the model before optimization.
- Parameters
model (alphapy.Model) – The model object with specifications.
algo (str) – Abbreviation of the algorithm to run.
est (alphapy.Estimator) – The estimator to fit.
- Returns
model – The model object with the initial estimator.
- Return type
alphapy.Model
Notes
AlphaPy fits an initial model because the user may choose to get a first score without any additional feature selection or grid search. XGBoost is a special case because it has the advantage of an
eval_set
andearly_stopping_rounds
, which can speed up the estimation phase.
-
alphapy.model.
generate_metrics
(model, partition)¶ Generate model evaluation metrics for all estimators.
- Parameters
model (alphapy.Model) – The model object with stored predictions.
partition (alphapy.Partition) – Reference to the dataset.
- Returns
model – The model object with the completed metrics.
- Return type
alphapy.Model
Notes
AlphaPy takes a brute-force approach to calculating each metric. It calls every scikit-learn function without exception. If the calculation fails for any reason, then the evaluation will still continue without error.
References
For more information about model evaluation and the associated metrics, refer to [EVAL].
-
alphapy.model.
get_model_config
()¶ Read in the configuration file for AlphaPy.
- Parameters
None (None)
- Returns
specs – The parameters for controlling AlphaPy.
- Return type
dict
- Raises
ValueError – Unrecognized value of a
model.yml
field.
-
alphapy.model.
load_feature_map
(model, directory)¶ Load the feature map from storage. By default, the most recent feature map is loaded into memory.
- Parameters
model (alphapy.Model) – The model object to contain the feature map.
directory (str) – Full directory specification of the feature map’s location.
- Returns
model – The model object containing the feature map.
- Return type
alphapy.Model
-
alphapy.model.
load_predictor
(directory)¶ Load the model predictor from storage. By default, the most recent model is loaded into memory.
- Parameters
directory (str) – Full directory specification of the predictor’s location.
- Returns
predictor – The scoring function.
- Return type
function
-
alphapy.model.
make_predictions
(model, algo, calibrate)¶ Make predictions for the training and testing data.
- Parameters
model (alphapy.Model) – The model object with specifications.
algo (str) – Abbreviation of the algorithm to make predictions.
calibrate (bool) – If
True
, calibrate the probabilities for a classifier.
- Returns
model – The model object with the predictions.
- Return type
alphapy.Model
Notes
For classification, calibration is a precursor to making the actual predictions. In this case, AlphaPy predicts both labels and probabilities. For regression, real values are predicted.
-
alphapy.model.
predict_best
(model)¶ Select the best model based on score.
- Parameters
model (alphapy.Model) – The model object with all of the estimators.
- Returns
model – The model object with the best estimator.
- Return type
alphapy.Model
Notes
Best model selection is based on a scoring function. If the objective is to minimize (e.g., negative log loss), then we select the model with the algorithm that has the lowest score. If the objective is to maximize, then we select the algorithm with the highest score (e.g., AUC).
For multiple algorithms, AlphaPy always creates a blended model. Therefore, the best algorithm that is selected could actually be the blended model itself.
-
alphapy.model.
predict_blend
(model)¶ Make predictions from a blended model.
- Parameters
model (alphapy.Model) – The model object with all of the estimators.
- Returns
model – The model object with the blended estimator.
- Return type
alphapy.Model
Notes
For classification, AlphaPy uses logistic regression for creating a blended model. For regression, ridge regression is applied.
-
alphapy.model.
save_feature_map
(model, timestamp)¶ Save the feature map to disk.
- Parameters
model (alphapy.Model) – The model object containing the feature map.
timestamp (str) – Date in yyyy-mm-dd format.
- Returns
None
- Return type
None
-
alphapy.model.
save_model
(model, tag, partition)¶ Save the results in the model file.
- Parameters
model (alphapy.Model) – The model object to save.
tag (str) – A unique identifier for the output files, e.g., a date stamp.
partition (alphapy.Partition) – Reference to the dataset.
- Returns
None
- Return type
None
Notes
The following components are extracted from the model object and saved to disk:
Model predictor (via joblib/pickle)
Predictions
Probabilities (classification only)
Rankings
Submission File (optional)
-
alphapy.model.
save_predictions
(model, tag, partition)¶ Save the predictions to disk.
- Parameters
model (alphapy.Model) – The model object to save.
tag (str) – A unique identifier for the output files, e.g., a date stamp.
partition (alphapy.Partition) – Reference to the dataset.
- Returns
preds (numpy array) – The prediction vector.
probas (numpy array) – The probability vector.
-
alphapy.model.
save_predictor
(model, timestamp)¶ Save the time-stamped model predictor to disk.
- Parameters
model (alphapy.Model) – The model object that contains the best estimator.
timestamp (str) – Date in yyyy-mm-dd format.
- Returns
None
- Return type
None
alphapy.optimize module¶
-
alphapy.optimize.
grid_report
(results, n_top=3)¶ Report the top grid search scores.
- Parameters
results (dict of numpy arrays) – Mean test scores for each grid search iteration.
n_top (int, optional) – The number of grid search results to report.
- Returns
None
- Return type
None
-
alphapy.optimize.
hyper_grid_search
(model, estimator)¶ Return the best hyperparameters for a grid search.
- Parameters
model (alphapy.Model) – The model object with grid search parameters.
estimator (alphapy.Estimator) – The estimator containing the hyperparameter grid.
- Returns
model – The model object with the grid search estimator.
- Return type
alphapy.Model
Notes
To reduce the time required for grid search, use either randomized grid search with a fixed number of iterations or a full grid search with subsampling. AlphaPy uses the scikit-learn Pipeline with feature selection to reduce the feature space.
References
For more information about grid search, refer to [GRID].
To learn about pipelines, refer to [PIPE].
-
alphapy.optimize.
rfecv_search
(model, algo)¶ Return the best feature set using recursive feature elimination with cross-validation.
- Parameters
model (alphapy.Model) – The model object with RFE parameters.
algo (str) – Abbreviation of the algorithm to run.
- Returns
model – The model object with the RFE support vector and the best estimator.
- Return type
alphapy.Model
Notes
If a scoring function is available, then AlphaPy can perform RFE with Cross-Validation (CV), as in this function; otherwise, it just does RFE without CV.
References
For more information about Recursive Feature Elimination, refer to [RFECV].
alphapy.plots module¶
-
alphapy.plots.
generate_plots
(model, partition)¶ Generate plots while running the pipeline.
- Parameters
model (alphapy.Model) – The model object with plotting specifications.
partition (alphapy.Partition) – Reference to the dataset.
- Returns
None
- Return type
None
-
alphapy.plots.
get_partition_data
(model, partition)¶ Get the X, y pair for a given model and partition
- Parameters
model (alphapy.Model) – The model object with partition data.
partition (alphapy.Partition) – Reference to the dataset.
- Returns
X (numpy array) – The feature matrix.
y (numpy array) – The target vector.
- Raises
TypeError – Partition must be train or test.
-
alphapy.plots.
get_plot_directory
(model)¶ Get the plot output directory of a model.
- Parameters
model (alphapy.Model) – The model object with directory information.
- Returns
plot_directory – The output directory to write the plot.
- Return type
str
-
alphapy.plots.
plot_boundary
(model, partition, f1=0, f2=1)¶ Display a comparison of classifiers
- Parameters
model (alphapy.Model) – The model object with plotting specifications.
partition (alphapy.Partition) – Reference to the dataset.
f1 (int) – Number of the first feature to compare.
f2 (int) – Number of the second feature to compare.
- Returns
None
- Return type
None
References
Code excerpts from authors:
Gael Varoquaux
Andreas Muller
http://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html
-
alphapy.plots.
plot_box
(df, x, y, hue, tag='eda', directory=None)¶ Display a Box Plot.
- Parameters
df (pandas.DataFrame) – The dataframe containing the
x
andy
features.x (str) – Variable name in
df
to display along the x-axis.y (str) – Variable name in
df
to display along the y-axis.hue (str) – Variable name to be used as hue, i.e., another data dimension.
tag (str) – Unique identifier for the plot.
directory (str, optional) – The full specification of the plot location.
- Returns
None
- Return type
None.
References
-
alphapy.plots.
plot_calibration
(model, partition)¶ Display scikit-learn calibration plots.
- Parameters
model (alphapy.Model) – The model object with plotting specifications.
partition (alphapy.Partition) – Reference to the dataset.
- Returns
None
- Return type
None
References
Code excerpts from authors:
Alexandre Gramfort <alexandre.gramfort@telecom-paristech.fr>
Jan Hendrik Metzen <jhm@informatik.uni-bremen.de>
-
alphapy.plots.
plot_candlestick
(df, symbol, datecol='date', directory=None)¶ Plot time series data.
- Parameters
df (pandas.DataFrame) – The dataframe containing the
target
feature.symbol (str) – Unique identifier of the data to plot.
datecol (str, optional) – The name of the date column.
directory (str, optional) – The full specification of the plot location.
- Returns
None
- Return type
None.
Notes
The dataframe
df
must contain these columns:open
high
low
close
References
http://bokeh.pydata.org/en/latest/docs/gallery/candlestick.html
-
alphapy.plots.
plot_confusion_matrix
(model, partition)¶ Draw the confusion matrix.
- Parameters
model (alphapy.Model) – The model object with plotting specifications.
partition (alphapy.Partition) – Reference to the dataset.
- Returns
None
- Return type
None
References
http://scikit-learn.org/stable/modules/model_evaluation.html#confusion-matrix
-
alphapy.plots.
plot_distribution
(df, target, tag='eda', directory=None)¶ Display a Distribution Plot.
- Parameters
df (pandas.DataFrame) – The dataframe containing the
target
feature.target (str) – The target variable for the distribution plot.
tag (str) – Unique identifier for the plot.
directory (str, optional) – The full specification of the plot location.
- Returns
None
- Return type
None.
References
-
alphapy.plots.
plot_facet_grid
(df, target, frow, fcol, tag='eda', directory=None)¶ Plot a Seaborn faceted histogram grid.
- Parameters
df (pandas.DataFrame) – The dataframe containing the features.
target (str) – The target variable for contrast.
frow (list of str) – Feature names for the row elements of the grid.
fcol (list of str) – Feature names for the column elements of the grid.
tag (str) – Unique identifier for the plot.
directory (str, optional) – The full specification of the plot location.
- Returns
None
- Return type
None.
References
-
alphapy.plots.
plot_importance
(model, partition)¶ Display scikit-learn feature importances.
- Parameters
model (alphapy.Model) – The model object with plotting specifications.
partition (alphapy.Partition) – Reference to the dataset.
- Returns
None
- Return type
None
References
http://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_importances.html
-
alphapy.plots.
plot_learning_curve
(model, partition)¶ Generate learning curves for a given partition.
- Parameters
model (alphapy.Model) – The model object with plotting specifications.
partition (alphapy.Partition) – Reference to the dataset.
- Returns
None
- Return type
None
References
http://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_importances.html
-
alphapy.plots.
plot_partial_dependence
(est, X, features, fnames, tag, n_jobs=- 1, verbosity=0, directory=None)¶ Display a Partial Dependence Plot.
- Parameters
est (estimator) – The scikit-learn estimator for calculating partial dependence.
X (numpy array) – The data on which the estimator was trained.
features (list of int) – Feature numbers of
X
.fnames (list of str) – The feature names to plot.
tag (str) – Unique identifier for the plot
n_jobs (int, optional) – The maximum number of parallel jobs.
verbosity (int, optional) – The amount of logging from 0 (minimum) and higher.
directory (str) – Directory where the plot will be stored.
- Returns
None
- Return type
None.
References
-
alphapy.plots.
plot_roc_curve
(model, partition)¶ Display ROC Curves with Cross-Validation.
- Parameters
model (alphapy.Model) – The model object with plotting specifications.
partition (alphapy.Partition) – Reference to the dataset.
- Returns
None
- Return type
None
References
http://scikit-learn.org/stable/modules/model_evaluation.html#receiver-operating-characteristic-roc
-
alphapy.plots.
plot_scatter
(df, features, target, tag='eda', directory=None)¶ Plot a scatterplot matrix, also known as a pair plot.
- Parameters
df (pandas.DataFrame) – The dataframe containing the features.
features (list of str) – The features to compare in the scatterplot.
target (str) – The target variable for contrast.
tag (str) – Unique identifier for the plot.
directory (str, optional) – The full specification of the plot location.
- Returns
None
- Return type
None.
References
-
alphapy.plots.
plot_swarm
(df, x, y, hue, tag='eda', directory=None)¶ Display a Swarm Plot.
- Parameters
df (pandas.DataFrame) – The dataframe containing the
x
andy
features.x (str) – Variable name in
df
to display along the x-axis.y (str) – Variable name in
df
to display along the y-axis.hue (str) – Variable name to be used as hue, i.e., another data dimension.
tag (str) – Unique identifier for the plot.
directory (str, optional) – The full specification of the plot location.
- Returns
None
- Return type
None.
References
-
alphapy.plots.
plot_time_series
(df, target, tag='eda', directory=None)¶ Plot time series data.
- Parameters
df (pandas.DataFrame) – The dataframe containing the
target
feature.target (str) – The target variable for the time series plot.
tag (str) – Unique identifier for the plot.
directory (str, optional) – The full specification of the plot location.
- Returns
None
- Return type
None.
References
-
alphapy.plots.
plot_validation_curve
(model, partition, pname, prange)¶ Generate scikit-learn validation curves.
- Parameters
model (alphapy.Model) – The model object with plotting specifications.
partition (alphapy.Partition) – Reference to the dataset.
pname (str) – Name of the hyperparameter to test.
prange (numpy array) – The values of the hyperparameter that will be evaluated.
- Returns
None
- Return type
None
References
-
alphapy.plots.
write_plot
(vizlib, plot, plot_type, tag, directory=None)¶ Save the plot to a file, or display it interactively.
- Parameters
vizlib (str) – The visualization library:
'matplotlib'
,'seaborn'
, or'bokeh'
.plot (module) – Plotting context, e.g.,
plt
.plot_type (str) – Type of plot to generate.
tag (str) – Unique identifier for the plot.
directory (str, optional) – The full specification for the directory location. if
directory
is None, then the plot is displayed interactively.
- Returns
None
- Return type
None.
- Raises
ValueError – Unrecognized data visualization library.
References
Visualization Libraries:
Matplotlib : http://matplotlib.org/
Seaborn : https://seaborn.pydata.org/
alphapy.portfolio module¶
-
class
alphapy.portfolio.
Portfolio
(group_name, tag, space=<alphapy.space.Space object>, maxpos=10, posby='close', kopos=0, koby='-profit', restricted=False, weightby='quantity', startcap=100000, margin=0.5, mincash=0.2, fixedfrac=0.1, maxloss=0.1)¶ Bases:
object
Create a new portfolio with a unique name. All portfolios are stored in
Portfolio.portfolios
.- Parameters
group_name (str) – The group represented in the portfolio.
tag (str) – A unique identifier.
space (alphapy.Space, optional) – Namespace for the portfolio.
maxpos (int, optional) – The maximum number of positions.
posby (str, optional) – The denominator for position sizing.
kopos (int, optional) – The number of positions to kick out from the portfolio.
koby (str, optional) – The “kick out” criteria. For example, a
koby
value of ‘-profit’ means the three least profitable positions will be closed.restricted (bool, optional) – If
True
, then the portfolio is limited to a maximum number of positionsmaxpos
.weightby (str, optional) – The weighting variable to balance the portfolio, e.g., by closing price, by volatility, or by any column.
startcap (float, optional) – The amount of starting capital.
margin (float, optional) – The amount of margin required, expressed as a fraction.
mincash (float, optional) – Minimum amount of cash on hand, expressed as a fraction of the total portfolio value.
fixedfrac (float, optional) – The fixed fraction for any given position.
maxloss (float, optional) – Stop loss for any given position.
- Variables
portfolios (dict) – Class variable for storing all known portfolios
value (float) – Class variable for storing all known portfolios
netprofit (float) – Net profit ($) since previous valuation.
netreturn (float) – Net return (%) since previous valuation
totalprofit (float) – Total profit ($) since inception.
totalreturn (float) – Total return (%) since inception.
-
portfolios
= {}¶
-
class
alphapy.portfolio.
Position
(portfolio, name, opendate)¶ Bases:
object
Create a new position in the portfolio.
- Parameters
portfolio (alphaPy.portfolio) – The portfolio that will contain the position.
name (str) – A unique identifier such as a stock symbol.
opendate (datetime) – Date the position is opened.
- Variables
date (timedate) – Current date of the position.
name (str) – A unique identifier.
status (str) – State of the position:
'opened'
or'closed'
.mpos (str) – Market position
'long'
or'short'
.quantity (float) – The net size of the position.
price (float) – The current price of the instrument.
value (float) – The total dollar value of the position.
profit (float) – The net profit of the current position.
netreturn (float) – The Return On Investment (ROI), or net return.
opened (datetime) – Date the position is opened.
held (int) – The holding period since the position was opened.
costbasis (float) – Overall cost basis.
trades (list of Trade) – The executed trades for the position so far.
ntrades (int) – Total number of trades.
pdata (pandas DataFrame) – Price data for the given
name
.multiplier (float) – Multiple for instrument type (e.g., 1.0 for stocks).
-
class
alphapy.portfolio.
Trade
(name, order, quantity, price, tdate)¶ Bases:
object
Initiate a trade.
- Parameters
name (str) – The symbol to trade.
order (alphapy.Orders) – Long or short trade for entry or exit.
quantity (int) – The quantity for the order.
price (str) – The execution price of the trade.
tdate (datetime) – The date and time of the trade.
- Variables
states (list of str) – Trade state names for a dataframe.
-
states
= ['name', 'order', 'quantity', 'price']¶
-
alphapy.portfolio.
add_position
(p, name, pos)¶ Add a position to a portfolio.
- Parameters
p (alphapy.Portfolio) – Portfolio that will hold the position.
name (int) – Unique identifier for the position, e.g., a stock symbol.
pos (alphapy.Position) – New position to add to the portfolio.
- Returns
p – Portfolio with the new position.
- Return type
alphapy.Portfolio
-
alphapy.portfolio.
allocate_trade
(p, pos, trade)¶ Determine the trade allocation for a given portfolio.
- Parameters
p (alphapy.Portfolio) – Portfolio that will hold the new position.
pos (alphapy.Position) – Position to update.
trade (alphapy.Trade) – The proposed trade.
- Returns
allocation – The trade size that can be allocated for the portfolio.
- Return type
float
-
alphapy.portfolio.
balance
(p, tdate, cashlevel)¶ Balance the portfolio using a weighting variable.
Rebalancing is the process of equalizing a portfolio’s positions using some criterion. For example, if a portfolio is dollar-weighted, then one position can increase in proportion to the rest of the portfolio, i.e., its fraction of the overall portfolio is greater than the other positions. To make the portfolio “equal dollar”, then some positions have to be decreased and others decreased.
The rebalancing process is periodic (e.g., once per month) and generates a series of trades to balance the positions. Other portfolios are volatility-weighted because a more volatile stock has a greater effect on the beta, i.e., the more volatile the instrument, the smaller the position size.
Technically, any type of weight can be used for rebalancing, so AlphaPy gives the user the ability to specify a
weightby
column name.- Parameters
p (alphapy.Portfolio) – Portfolio to rebalance.
tdate (datetime) – The rebalancing date.
cashlevel (float) – The cash level to maintain during rebalancing.
- Returns
p – The rebalanced portfolio.
- Return type
alphapy.Portfolio
Notes
Warning
The portfolio management functions
balance
,kick_out
, andstop_loss
are not part of the main StockStream pipeline, and thus have not been thoroughly tested. Feel free to exercise the code and report any issues.
-
alphapy.portfolio.
close_position
(p, position, tdate)¶ Close the position and remove it from the portfolio.
- Parameters
p (alphapy.Portfolio) – Portfolio holding the position.
position (alphapy.Position) – Position to close.
tdate (datetime) – The date for pricing the closed position.
- Returns
p – Portfolio with the removed position.
- Return type
alphapy.Portfolio
-
alphapy.portfolio.
delete_portfolio
(p)¶ Delete the portfolio.
- Parameters
p (alphapy.Portfolio) – Portfolio to delete.
- Returns
None
- Return type
None
-
alphapy.portfolio.
deposit_portfolio
(p, cash, tdate)¶ Deposit cash into a given portfolio.
- Parameters
p (alphapy.Portfolio) – Portfolio to accept the deposit.
cash (float) – Cash amount to deposit.
tdate (datetime) – The date of deposit.
- Returns
p – Portfolio with the added cash.
- Return type
alphapy.Portfolio
-
alphapy.portfolio.
exec_trade
(p, name, order, quantity, price, tdate)¶ Execute a trade for a portfolio.
- Parameters
p (alphapy.Portfolio) – Portfolio in which to trade.
name (str) – The symbol to trade.
order (alphapy.Orders) – Long or short trade for entry or exit.
quantity (int) – The quantity for the order.
price (str) – The execution price of the trade.
tdate (datetime) – The date and time of the trade.
- Returns
tsize – The executed trade size.
- Return type
float
- Other Parameters
Frame.frames (dict) – Dataframe for the price data.
-
alphapy.portfolio.
gen_portfolio
(model, system, group, tframe, startcap=100000, posby='close')¶ Create a portfolio from a trades frame.
- Parameters
model (alphapy.Model) – The model with specifications.
system (str) – Name of the system.
group (alphapy.Group) – The group of instruments in the portfolio.
tframe (pandas.DataFrame) – The input trade list from running the system.
startcap (float) – Starting capital.
posby (str) – The position sizing column in the price dataframe.
- Returns
p – The generated portfolio.
- Return type
alphapy.Portfolio
- Raises
MemoryError – Could not allocate Portfolio.
Notes
This function also generates the files required for analysis by the pyfolio package:
Returns File
Positions File
Transactions File
-
alphapy.portfolio.
kick_out
(p, tdate)¶ Trim the portfolio based on filter criteria.
To reduce a portfolio’s positions, AlphaPy can rank the positions on some criterion, such as open profit or net return. On a periodic basis, the worst performers can be culled from the portfolio.
- Parameters
p (alphapy.Portfolio) – The portfolio for reducing positions.
tdate (datetime) – The date to trim the portfolio positions.
- Returns
p – The reduced portfolio.
- Return type
alphapy.Portfolio
Notes
Warning
The portfolio management functions
kick_out
,balance
, andstop_loss
are not part of the main StockStream pipeline, and thus have not been thoroughly tested. Feel free to exercise the code and report any issues.
-
alphapy.portfolio.
portfolio_name
(group_name, tag)¶ Return the name of the portfolio.
- Parameters
group_name (str) – The group represented in the portfolio.
tag (str) – A unique identifier.
- Returns
port_name – Portfolio name.
- Return type
str
-
alphapy.portfolio.
remove_position
(p, name)¶ Remove a position from a portfolio by name.
- Parameters
p (alphapy.Portfolio) – Portfolio with the current position.
name (int) – Unique identifier for the position, e.g., a stock symbol.
- Returns
p – Portfolio with the deleted position.
- Return type
alphapy.Portfolio
-
alphapy.portfolio.
stop_loss
(p, tdate)¶ Trim the portfolio based on stop-loss criteria.
- Parameters
p (alphapy.Portfolio) – The portfolio for reducing positions based on
maxloss
.tdate (datetime) – The date to trim any underperforming positions.
- Returns
p – The reduced portfolio.
- Return type
alphapy.Portfolio
Notes
Warning
The portfolio management functions
stop_loss
,balance
, andkick_out
are not part of the main StockStream pipeline, and thus have not been thoroughly tested. Feel free to exercise the code and report any issues.
-
alphapy.portfolio.
update_portfolio
(p, pos, trade)¶ Update the portfolio positions.
- Parameters
p (alphapy.Portfolio) – Portfolio holding the position.
pos (alphapy.Position) – Position to update.
trade (alphapy.Trade) – Trade for updating the position and portfolio.
- Returns
p – Portfolio with the revised position.
- Return type
alphapy.Portfolio
-
alphapy.portfolio.
update_position
(position, trade)¶ Add the new trade to the position and revalue.
- Parameters
position (alphapy.Position) – The position to be update.
trade (alphapy.Trade) – Trade for updating the position.
- Returns
position – New value of the position.
- Return type
alphapy.Position
-
alphapy.portfolio.
valuate_portfolio
(p, tdate)¶ Value the portfolio based on the current positions.
- Parameters
p (alphapy.Portfolio) – Portfolio for calculating profit and return.
tdate (datetime) – The date of valuation.
- Returns
p – Portfolio with the new valuation.
- Return type
alphapy.Portfolio
-
alphapy.portfolio.
valuate_position
(position, tdate)¶ Valuate the position for the given date.
- Parameters
position (alphapy.Position) – The position to be valued.
tdate (timedate) – Date to value the position.
- Returns
position – New value of the position.
- Return type
alphapy.Position
Notes
An Example of Cost Basis
Date
Shares
Price
Amount
11/09/16
+100
10.0
1,000
12/14/16
+200
15.0
3,000
04/05/17
-500
20.0
10,000
All
800
14,000
The cost basis is calculated as the total value of all trades (14,000) divided by the total number of shares traded (800), so 14,000 / 800 = 17.5, and the net position is -200.
-
alphapy.portfolio.
withdraw_portfolio
(p, cash, tdate)¶ Withdraw cash from a given portfolio.
- Parameters
p (alphapy.Portfolio) – Portfolio to accept the withdrawal.
cash (float) – Cash amount to withdraw.
tdate (datetime) – The date of withdrawal.
- Returns
p – Portfolio with the withdrawn cash.
- Return type
alphapy.Portfolio
alphapy.space module¶
-
class
alphapy.space.
Space
(subject='stock', schema='prices', fractal='1d')¶ Bases:
object
Create a new namespace.
- Parameters
subject (str) – An identifier for a group of related items.
schema (str) – The data related to the
subject
.fractal (str) – The time fractal of the data, e.g., “5m” or “1d”.
-
alphapy.space.
space_name
(subject, schema, fractal)¶ Get the namespace string.
- Parameters
subject (str) – An identifier for a group of related items.
schema (str) – The data related to the
subject
.fractal (str) – The time fractal of the data, e.g., “5m” or “1d”.
- Returns
name – The joined namespace string.
- Return type
str
alphapy.sport_flow module¶
-
alphapy.sport_flow.
add_features
(frame, fdict, flen, prefix='')¶ Add new features to a dataframe with the specified dictionary.
- Parameters
frame (pandas.DataFrame) – The dataframe to extend with new features defined by
fdict
.fdict (dict) – A dictionary of column names (key) and data types (value).
flen (int) – Length of
frame
.prefix (str, optional) – Prepend all columns with a prefix.
- Returns
frame – The dataframe with the added features.
- Return type
pandas.DataFrame
-
alphapy.sport_flow.
generate_delta_data
(frame, fdict, prefix1, prefix2)¶ Subtract two similar columns to get the delta value.
- Parameters
frame (pandas.DataFrame) – The input model frame.
fdict (dict) – A dictionary of column names (key) and data types (value).
prefix1 (str) – The prefix of the first team.
prefix2 (str) – The prefix of the second team.
- Returns
frame – The completed dataframe with the delta data.
- Return type
pandas.DataFrame
-
alphapy.sport_flow.
generate_team_frame
(team, tf, home_team, away_team, window)¶ Calculate statistics for each team.
- Parameters
team (str) – The abbreviation for the team.
tf (pandas.DataFrame) – The initial team frame.
home_team (str) – Label for the home team.
away_team (str) – Label for the away team.
window (int) – The value for the rolling window to calculate means and sums.
- Returns
tf – The completed team frame.
- Return type
pandas.DataFrame
-
alphapy.sport_flow.
get_day_offset
(date_vector)¶ Compute the day offsets between games.
- Parameters
date_vector (pandas.Series) – The date column.
- Returns
day_offset – A vector of day offsets between adjacent dates.
- Return type
pandas.Series
-
alphapy.sport_flow.
get_losses
(point_margin)¶ Determine a loss based on the point margin.
- Parameters
point_margin (int) – The point margin can be positive, zero, or negative.
- Returns
lost – If the point margin is less than 0, return 1, else 0.
- Return type
int
-
alphapy.sport_flow.
get_point_margin
(row, score, opponent_score)¶ Get the point margin for a game.
- Parameters
row (pandas.Series) – The row of a game.
score (int) – The score for one team.
opponent_score (int) – The score for the other team.
- Returns
point_margin – The resulting point margin (0 if NaN).
- Return type
int
-
alphapy.sport_flow.
get_series_diff
(series)¶ Perform the difference operation on a series.
- Parameters
series (pandas.Series) – The series for the
diff
operation.- Returns
new_series – The differenced series.
- Return type
pandas.Series
-
alphapy.sport_flow.
get_sport_config
()¶ Read the configuration file for SportFlow.
- Parameters
None (None)
- Returns
specs – The parameters for controlling SportFlow.
- Return type
dict
-
alphapy.sport_flow.
get_streak
(series, start_index, window)¶ Calculate the current streak.
- Parameters
series (pandas.Series) – A Boolean series for calculating streaks.
start_index (int) – The offset of the series to start counting.
window (int) – The period over which to count.
- Returns
streak – The count value for the current streak.
- Return type
int
-
alphapy.sport_flow.
get_team_frame
(game_frame, team, home, away)¶ Calculate statistics for each team.
- Parameters
game_frame (pandas.DataFrame) – The game frame for a given season.
team (str) – The team abbreviation.
home (str) – The label of the home team column.
away (int) – The label of the away team column.
- Returns
team_frame – The extracted team frame.
- Return type
pandas.DataFrame
-
alphapy.sport_flow.
get_ties
(point_margin)¶ Determine a tie based on the point margin.
- Parameters
point_margin (int) – The point margin can be positive, zero, or negative.
- Returns
tied – If the point margin is equal to 0, return 1, else 0.
- Return type
int
-
alphapy.sport_flow.
get_wins
(point_margin)¶ Determine a win based on the point margin.
- Parameters
point_margin (int) – The point margin can be positive, zero, or negative.
- Returns
won – If the point margin is greater than 0, return 1, else 0.
- Return type
int
-
alphapy.sport_flow.
insert_model_data
(mf, mpos, mdict, tf, tpos, prefix)¶ Insert a row from the team frame into the model frame.
- Parameters
mf (pandas.DataFrame) – The model frame for a single season.
mpos (int) – The position in the model frame where to insert the row.
mdict (dict) – A dictionary of column names (key) and data types (value).
tf (pandas.DataFrame) – The team frame for a season.
tpos (int) – The position of the row in the team frame.
prefix (str) – The prefix to join with the
mdict
key.
- Returns
mf – The .
- Return type
pandas.DataFrame
-
alphapy.sport_flow.
main
(args=None)¶ The main program for SportFlow.
Notes
Initialize logging.
Parse the command line arguments.
Get the game configuration.
Get the model configuration.
Generate game frames for each season.
Create statistics for each team.
Merge the team frames into the final model frame.
Run the AlphaPy pipeline.
- Raises
ValueError – Training date must be before prediction date.
alphapy.system module¶
-
class
alphapy.system.
System
(name, longentry, shortentry=None, longexit=None, shortexit=None, holdperiod=0, scale=False)¶ Bases:
object
Create a new system. All systems are stored in
System.systems
. Duplicate names are not allowed.- Parameters
name (str) – The system name.
longentry (str) – Name of the conditional feature for a long entry.
shortentry (str, optional) – Name of the conditional feature for a short entry.
longexit (str, optional) – Name of the conditional feature for a long exit.
shortexit (str, optional) – Name of the conditional feature for a short exit.
holdperiod (int, optional) – Holding period of a position.
scale (bool, optional) – Add to a position for a signal in the same direction.
- Variables
systems (dict) – Class variable for storing all known systems
Examples
>>> System('closer', hc, lc)
-
systems
= {}¶
-
alphapy.system.
run_system
(model, system, group, intraday=False, quantity=1)¶ Run a system for a given group, creating a trades frame.
- Parameters
model (alphapy.Model) – The model object with specifications.
system (alphapy.System) – The system to run.
group (alphapy.Group) – The group of symbols to trade.
intraday (bool, optional) – If true, this is an intraday system.
quantity (float, optional) – The amount to trade for each symbol, e.g., number of shares
- Returns
tf – All of the trades for this
group
.- Return type
pandas.DataFrame
-
alphapy.system.
trade_system
(model, system, space, intraday, name, quantity)¶ Trade the given system.
- Parameters
model (alphapy.Model) – The model object with specifications.
system (alphapy.System) – The long/short system to run.
space (alphapy.Space) – Namespace of instrument prices.
intraday (bool) – If True, then run an intraday system.
name (str) – The symbol to trade.
quantity (float) – The amount of the
name
to trade, e.g., number of shares
- Returns
tradelist – List of trade entries and exits.
- Return type
list
- Other Parameters
Frame.frames (dict) – All of the data frames containing price data.
alphapy.transforms module¶
-
alphapy.transforms.
abovema
(f, c, p=50)¶ Determine those values of the dataframe that are above the moving average.
- Parameters
f (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
f
.p (int) – The period of the moving average.
- Returns
new_column – The array containing the new feature.
- Return type
pandas.Series (bool)
-
alphapy.transforms.
adx
(f, p=14)¶ Calculate the Average Directional Index (ADX).
- Parameters
f (pandas.DataFrame) – Dataframe with all columns required for calculation. If you are applying ADX through
vapply
, then these columns are calculated automatically.p (int) – The period over which to calculate the ADX.
- Returns
new_column – The array containing the new feature.
- Return type
pandas.Series (float)
References
The Average Directional Movement Index (ADX) was invented by J. Welles Wilder in 1978 [WIKI_ADX]. Its value reflects the strength of trend in any given instrument.
-
alphapy.transforms.
belowma
(f, c, p=50)¶ Determine those values of the dataframe that are below the moving average.
- Parameters
f (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
f
.p (int) – The period of the moving average.
- Returns
new_column – The array containing the new feature.
- Return type
pandas.Series (bool)
-
alphapy.transforms.
c2max
(f, c1, c2)¶ Take the maximum value between two columns in a dataframe.
- Parameters
f (pandas.DataFrame) – Dataframe containing the two columns
c1
andc2
.c1 (str) – Name of the first column in the dataframe
f
.c2 (str) – Name of the second column in the dataframe
f
.
- Returns
max_val – The maximum value of the two columns.
- Return type
float
-
alphapy.transforms.
c2min
(f, c1, c2)¶ Take the minimum value between two columns in a dataframe.
- Parameters
f (pandas.DataFrame) – Dataframe containing the two columns
c1
andc2
.c1 (str) – Name of the first column in the dataframe
f
.c2 (str) – Name of the second column in the dataframe
f
.
- Returns
min_val – The minimum value of the two columns.
- Return type
float
-
alphapy.transforms.
diff
(f, c, n=1)¶ Calculate the n-th order difference for the given variable.
- Parameters
f (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
f
.n (int) – The number of times that the values are differenced.
- Returns
new_column – The array containing the new feature.
- Return type
pandas.Series (float)
-
alphapy.transforms.
diminus
(f, p=14)¶ Calculate the Minus Directional Indicator (-DI).
- Parameters
f (pandas.DataFrame) – Dataframe with columns
high
andlow
.p (int) – The period over which to calculate the -DI.
- Returns
new_column – The array containing the new feature.
- Return type
pandas.Series (float)
References
A component of the average directional index (ADX) that is used to measure the presence of a downtrend. When the -DI is sloping downward, it is a signal that the downtrend is getting stronger [IP_NDI].
-
alphapy.transforms.
diplus
(f, p=14)¶ Calculate the Plus Directional Indicator (+DI).
- Parameters
f (pandas.DataFrame) – Dataframe with columns
high
andlow
.p (int) – The period over which to calculate the +DI.
- Returns
new_column – The array containing the new feature.
- Return type
pandas.Series (float)
References
A component of the average directional index (ADX) that is used to measure the presence of an uptrend. When the +DI is sloping upward, it is a signal that the uptrend is getting stronger [IP_PDI].
-
alphapy.transforms.
dminus
(f)¶ Calculate the Minus Directional Movement (-DM).
- Parameters
f (pandas.DataFrame) – Dataframe with columns
high
andlow
.- Returns
new_column – The array containing the new feature.
- Return type
pandas.Series (float)
References
Directional movement is negative (minus) when the prior low minus the current low is greater than the current high minus the prior high. This so-called Minus Directional Movement (-DM) equals the prior low minus the current low, provided it is positive. A negative value would simply be entered as zero [SC_ADX].
-
alphapy.transforms.
dmplus
(f)¶ Calculate the Plus Directional Movement (+DM).
- Parameters
f (pandas.DataFrame) – Dataframe with columns
high
andlow
.- Returns
new_column – The array containing the new feature.
- Return type
pandas.Series (float)
References
Directional movement is positive (plus) when the current high minus the prior high is greater than the prior low minus the current low. This so-called Plus Directional Movement (+DM) then equals the current high minus the prior high, provided it is positive. A negative value would simply be entered as zero [SC_ADX].
-
alphapy.transforms.
down
(f, c)¶ Find the negative values in the series.
- Parameters
f (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
f
.
- Returns
new_column – The array containing the new feature.
- Return type
pandas.Series (bool)
-
alphapy.transforms.
dpc
(f, c)¶ Get the negative values, with positive values zeroed.
- Parameters
f (pandas.DataFrame) – Dataframe with column
c
.c (str) – Name of the column.
- Returns
new_column – The array containing the new feature.
- Return type
pandas.Series (float)
-
alphapy.transforms.
ema
(f, c, p=20)¶ Calculate the mean on a rolling basis.
- Parameters
f (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
f
.p (int) – The period over which to calculate the rolling mean.
- Returns
new_column – The array containing the new feature.
- Return type
pandas.Series (float)
References
An exponential moving average (EMA) is a type of moving average that is similar to a simple moving average, except that more weight is given to the latest data [IP_EMA].
-
alphapy.transforms.
extract_bizday
(f, c)¶ Extract business day of month and week.
- Parameters
f (pandas.DataFrame) – Dataframe containing the date column
c
.c (str) – Name of the date column in the dataframe
f
.
- Returns
date_features – The dataframe containing the date features.
- Return type
pandas.DataFrame
-
alphapy.transforms.
extract_date
(f, c)¶ Extract date into its components: year, month, day, dayofweek.
- Parameters
f (pandas.DataFrame) – Dataframe containing the date column
c
.c (str) – Name of the date column in the dataframe
f
.
- Returns
date_features – The dataframe containing the date features.
- Return type
pandas.DataFrame
-
alphapy.transforms.
extract_time
(f, c)¶ Extract time into its components: hour, minute, second.
- Parameters
f (pandas.DataFrame) – Dataframe containing the time column
c
.c (str) – Name of the time column in the dataframe
f
.
- Returns
time_features – The dataframe containing the time features.
- Return type
pandas.DataFrame
-
alphapy.transforms.
gap
(f)¶ Calculate the gap percentage between the current open and the previous close.
- Parameters
f (pandas.DataFrame) – Dataframe with columns
open
andclose
.- Returns
new_column – The array containing the new feature.
- Return type
pandas.Series (float)
References
A gap is a break between prices on a chart that occurs when the price of a stock makes a sharp move up or down with no trading occurring in between [IP_GAP].
-
alphapy.transforms.
gapbadown
(f)¶ Determine whether or not there has been a breakaway gap down.
- Parameters
f (pandas.DataFrame) – Dataframe with columns
open
andlow
.- Returns
new_column – The array containing the new feature.
- Return type
pandas.Series (bool)
References
A breakaway gap represents a gap in the movement of a stock price supported by levels of high volume [IP_BAGAP].
-
alphapy.transforms.
gapbaup
(f)¶ Determine whether or not there has been a breakaway gap up.
- Parameters
f (pandas.DataFrame) – Dataframe with columns
open
andhigh
.- Returns
new_column – The array containing the new feature.
- Return type
pandas.Series (bool)
References
A breakaway gap represents a gap in the movement of a stock price supported by levels of high volume [IP_BAGAP].
-
alphapy.transforms.
gapdown
(f)¶ Determine whether or not there has been a gap down.
- Parameters
f (pandas.DataFrame) – Dataframe with columns
open
andclose
.- Returns
new_column – The array containing the new feature.
- Return type
pandas.Series (bool)
References
A gap is a break between prices on a chart that occurs when the price of a stock makes a sharp move up or down with no trading occurring in between [IP_GAP].
-
alphapy.transforms.
gapup
(f)¶ Determine whether or not there has been a gap up.
- Parameters
f (pandas.DataFrame) – Dataframe with columns
open
andclose
.- Returns
new_column – The array containing the new feature.
- Return type
pandas.Series (bool)
References
A gap is a break between prices on a chart that occurs when the price of a stock makes a sharp move up or down with no trading occurring in between [IP_GAP].
-
alphapy.transforms.
gtval
(f, c1, c2)¶ Determine whether or not the first column of a dataframe is greater than the second.
- Parameters
f (pandas.DataFrame) – Dataframe containing the two columns
c1
andc2
.c1 (str) – Name of the first column in the dataframe
f
.c2 (str) – Name of the second column in the dataframe
f
.
- Returns
new_column – The array containing the new feature.
- Return type
pandas.Series (bool)
-
alphapy.transforms.
gtval0
(f, c1, c2)¶ For positive values in the first column of the dataframe that are greater than the second column, get the value in the first column, otherwise return zero.
- Parameters
f (pandas.DataFrame) – Dataframe containing the two columns
c1
andc2
.c1 (str) – Name of the first column in the dataframe
f
.c2 (str) – Name of the second column in the dataframe
f
.
- Returns
new_val – A positive value or zero.
- Return type
float
-
alphapy.transforms.
higher
(f, c, o=1)¶ Determine whether or not a series value is higher than the value
o
periods back.- Parameters
f (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
f
.o (int, optional) – Offset value for shifting the series.
- Returns
new_column – The array containing the new feature.
- Return type
pandas.Series (bool)
-
alphapy.transforms.
highest
(f, c, p=20)¶ Calculate the highest value on a rolling basis.
- Parameters
f (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
f
.p (int) – The period over which to calculate the rolling maximum.
- Returns
new_column – The array containing the new feature.
- Return type
pandas.Series (float)
-
alphapy.transforms.
hlrange
(f, p=1)¶ Calculate the Range, the difference between High and Low.
- Parameters
f (pandas.DataFrame) – Dataframe with columns
high
andlow
.p (int) – The period over which the range is calculated.
- Returns
new_column – The array containing the new feature.
- Return type
pandas.Series (float)
-
alphapy.transforms.
lower
(f, c, o=1)¶ Determine whether or not a series value is lower than the value
o
periods back.- Parameters
f (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
f
.o (int, optional) – Offset value for shifting the series.
- Returns
new_column – The array containing the new feature.
- Return type
pandas.Series (bool)
-
alphapy.transforms.
lowest
(f, c, p=20)¶ Calculate the lowest value on a rolling basis.
- Parameters
f (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
f
.p (int) – The period over which to calculate the rolling minimum.
- Returns
new_column – The array containing the new feature.
- Return type
pandas.Series (float)
-
alphapy.transforms.
ma
(f, c, p=20)¶ Calculate the mean on a rolling basis.
- Parameters
f (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
f
.p (int) – The period over which to calculate the rolling mean.
- Returns
new_column – The array containing the new feature.
- Return type
pandas.Series (float)
References
In statistics, a moving average (rolling average or running average) is a calculation to analyze data points by creating series of averages of different subsets of the full data set [WIKI_MA].
-
alphapy.transforms.
maratio
(f, c, p1=1, p2=10)¶ Calculate the ratio of two moving averages.
- Parameters
f (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
f
.p1 (int) – The period of the first moving average.
p2 (int) – The period of the second moving average.
- Returns
new_column – The array containing the new feature.
- Return type
pandas.Series (float)
-
alphapy.transforms.
mval
(f, c)¶ Get the negative value, otherwise zero.
- Parameters
f (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
f
.
- Returns
new_val – Negative value or zero.
- Return type
float
-
alphapy.transforms.
net
(f, c='close', o=1)¶ Calculate the net change of a given column.
- Parameters
f (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
f
.o (int, optional) – Offset value for shifting the series.
- Returns
new_column – The array containing the new feature.
- Return type
pandas.Series (float)
References
Net change is the difference between the closing price of a security on the day’s trading and the previous day’s closing price. Net change can be positive or negative and is quoted in terms of dollars [IP_NET].
-
alphapy.transforms.
netreturn
(f, c, o=1)¶ Calculate the net return, or Return On Invesment (ROI)
- Parameters
f (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
f
.o (int, optional) – Offset value for shifting the series.
- Returns
new_column – The array containing the new feature.
- Return type
pandas.Series (float)
References
ROI measures the amount of return on an investment relative to the original cost. To calculate ROI, the benefit (or return) of an investment is divided by the cost of the investment, and the result is expressed as a percentage or a ratio [IP_ROI].
-
alphapy.transforms.
pchange1
(f, c, o=1)¶ Calculate the percentage change within the same variable.
- Parameters
f (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
f
.o (int) – Offset to the previous value.
- Returns
new_column – The array containing the new feature.
- Return type
pandas.Series (float)
-
alphapy.transforms.
pchange2
(f, c1, c2)¶ Calculate the percentage change between two variables.
- Parameters
f (pandas.DataFrame) – Dataframe containing the two columns
c1
andc2
.c1 (str) – Name of the first column in the dataframe
f
.c2 (str) – Name of the second column in the dataframe
f
.
- Returns
new_column – The array containing the new feature.
- Return type
pandas.Series (float)
-
alphapy.transforms.
pval
(f, c)¶ Get the positive value, otherwise zero.
- Parameters
f (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
f
.
- Returns
new_val – Positive value or zero.
- Return type
float
-
alphapy.transforms.
rindex
(f, ci, ch, cl, p=1)¶ Calculate the range index spanning a given period
p
.The range index is a number between 0 and 100 that relates the value of the index column
ci
to the high columnch
and the low columncl
. For example, if the low value of the range is 10 and the high value is 20, then the range index for a value of 15 would be 50%. The range index for 18 would be 80%.- Parameters
f (pandas.DataFrame) – Dataframe containing the columns
ci
,ch
, andcl
.ci (str) – Name of the index column in the dataframe
f
.ch (str) – Name of the high column in the dataframe
f
.cl (str) – Name of the low column in the dataframe
f
.p (int) – The period over which the range index of column
ci
is calculated.
- Returns
new_column – The array containing the new feature.
- Return type
pandas.Series (float)
-
alphapy.transforms.
rsi
(f, c, p=14)¶ Calculate the Relative Strength Index (RSI).
- Parameters
f (pandas.DataFrame) – Dataframe containing the column
net
.c (str) – Name of the column in the dataframe
f
.p (int) – The period over which to calculate the RSI.
- Returns
new_column – The array containing the new feature.
- Return type
pandas.Series (float)
References
Developed by J. Welles Wilder, the Relative Strength Index (RSI) is a momentum oscillator that measures the speed and change of price movements [SC_RSI].
-
alphapy.transforms.
rtotal
(vec)¶ Calculate the running total.
- Parameters
vec (pandas.Series) – The input array for calculating the running total.
- Returns
running_total – The final running total.
- Return type
int
Example
>>> vec.rolling(window=20).apply(rtotal)
-
alphapy.transforms.
runs
(vec)¶ Calculate the total number of runs.
- Parameters
vec (pandas.Series) – The input array for calculating the number of runs.
- Returns
runs_value – The total number of runs.
- Return type
int
Example
>>> vec.rolling(window=20).apply(runs)
-
alphapy.transforms.
runs_test
(f, c, wfuncs, window)¶ Perform a runs test on binary series.
- Parameters
f (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
f
.wfuncs (list) – The set of runs test functions to apply to the column:
'all'
:Run all of the functions below.
'rtotal'
:The running total over the
window
period.'runs'
:Total number of runs in
window
.'streak'
:The length of the latest streak.
'zscore'
:The Z-Score over the
window
period.
window (int) – The rolling period.
- Returns
new_features – The dataframe containing the runs test features.
- Return type
pandas.DataFrame
References
For more information about runs tests for detecting non-randomness, refer to [RUNS].
-
alphapy.transforms.
split_to_letters
(f, c)¶ Separate text into distinct characters.
- Parameters
f (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the text column in the dataframe
f
.
- Returns
new_feature – The array containing the new feature.
- Return type
pandas.Series
Example
The value ‘abc’ becomes ‘a b c’.
-
alphapy.transforms.
streak
(vec)¶ Determine the length of the latest streak.
- Parameters
vec (pandas.Series) – The input array for calculating the latest streak.
- Returns
latest_streak – The length of the latest streak.
- Return type
int
Example
>>> vec.rolling(window=20).apply(streak)
-
alphapy.transforms.
texplode
(f, c)¶ Get dummy values for a text column.
- Parameters
f (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the text column in the dataframe
f
.
- Returns
dummies – The dataframe containing the dummy variables.
- Return type
pandas.DataFrame
Example
This function is useful for columns that appear to have separate character codes but are consolidated into a single column. Here, the column
c
is transformed into five dummy variables.c
0_a
1_x
1_b
2_x
2_z
abz
1
0
1
0
1
abz
1
0
1
0
1
axx
1
1
0
1
0
abz
1
0
1
0
1
axz
1
1
0
0
1
-
alphapy.transforms.
truehigh
(f)¶ Calculate the True High value.
- Parameters
f (pandas.DataFrame) – Dataframe with columns
high
andlow
.- Returns
new_column – The array containing the new feature.
- Return type
pandas.Series (float)
References
Today’s high, or the previous close, whichever is higher [TS_TR].
-
alphapy.transforms.
truelow
(f)¶ Calculate the True Low value.
- Parameters
f (pandas.DataFrame) – Dataframe with columns
high
andlow
.- Returns
new_column – The array containing the new feature.
- Return type
pandas.Series (float)
References
Today’s low, or the previous close, whichever is lower [TS_TR].
-
alphapy.transforms.
truerange
(f)¶ Calculate the True Range value.
- Parameters
f (pandas.DataFrame) – Dataframe with columns
high
andlow
.- Returns
new_column – The array containing the new feature.
- Return type
pandas.Series (float)
References
True High - True Low [TS_TR].
-
alphapy.transforms.
up
(f, c)¶ Find the positive values in the series.
- Parameters
f (pandas.DataFrame) – Dataframe containing the column
c
.c (str) – Name of the column in the dataframe
f
.
- Returns
new_column – The array containing the new feature.
- Return type
pandas.Series (bool)
-
alphapy.transforms.
upc
(f, c)¶ Get the positive values, with negative values zeroed.
- Parameters
f (pandas.DataFrame) – Dataframe with column
c
.c (str) – Name of the column.
- Returns
new_column – The array containing the new feature.
- Return type
pandas.Series (float)
-
alphapy.transforms.
xmadown
(f, c='close', pfast=20, pslow=50)¶ Determine those values of the dataframe that are below the moving average.
- Parameters
f (pandas.DataFrame) – Dataframe containing the column
c
.c (str, optional) – Name of the column in the dataframe
f
.pfast (int, optional) – The period of the fast moving average.
pslow (int, optional) – The period of the slow moving average.
- Returns
new_column – The array containing the new feature.
- Return type
pandas.Series (bool)
References
In the statistics of time series, and in particular the analysis of financial time series for stock trading purposes, a moving-average crossover occurs when, on plotting two moving averages each based on different degrees of smoothing, the traces of these moving averages cross [WIKI_XMA].
-
alphapy.transforms.
xmaup
(f, c='close', pfast=20, pslow=50)¶ Determine those values of the dataframe that are below the moving average.
- Parameters
f (pandas.DataFrame) – Dataframe containing the column
c
.c (str, optional) – Name of the column in the dataframe
f
.pfast (int, optional) – The period of the fast moving average.
pslow (int, optional) – The period of the slow moving average.
- Returns
new_column – The array containing the new feature.
- Return type
pandas.Series (bool)
References
In the statistics of time series, and in particular the analysis of financial time series for stock trading purposes, a moving-average crossover occurs when, on plotting two moving averages each based on different degrees of smoothing, the traces of these moving averages cross [WIKI_XMA].
-
alphapy.transforms.
zscore
(vec)¶ Calculate the Z-Score.
- Parameters
vec (pandas.Series) – The input array for calculating the Z-Score.
- Returns
zscore – The value of the Z-Score.
- Return type
float
References
To calculate the Z-Score, you can find more information here [ZSCORE].
Example
>>> vec.rolling(window=20).apply(zscore)
alphapy.utilities module¶
-
alphapy.utilities.
get_datestamp
()¶ Returns today’s datestamp.
- Returns
datestamp – The valid date string in YYYY-mm-dd format.
- Return type
str
-
alphapy.utilities.
most_recent_file
(directory, file_spec)¶ Find the most recent file in a directory.
- Parameters
directory (str) – Full directory specification.
file_spec (str) – Wildcard search string for the file to locate.
- Returns
file_name – Name of the file to read, excluding the
extension
.- Return type
str
-
alphapy.utilities.
np_store_data
(data, dir_name, file_name, extension, separator)¶ Store NumPy data in a file.
- Parameters
data (numpy array) – The model component to store
dir_name (str) – Full directory specification.
file_name (str) – Name of the file to read, excluding the
extension
.extension (str) – File name extension, e.g.,
csv
.separator (str) – The delimiter between fields in the file.
- Returns
None
- Return type
None
-
alphapy.utilities.
remove_list_items
(elements, alist)¶ Remove one or more items from the given list.
- Parameters
elements (list) – The items to remove from the list
alist
.alist (list) – Any object of any type can be a list item.
- Returns
sublist – The subset of items after removal.
- Return type
list
Examples
>>> test_list = ['a', 'b', 'c', test_func] >>> remove_list_items([test_func], test_list) # ['a', 'b', 'c']
-
alphapy.utilities.
subtract_days
(date_string, ndays)¶ Subtract a number of days from a given date.
- Parameters
date_string (str) – An alphanumeric string in the format %Y-%m-%d.
ndays (int) – Number of days to subtract.
- Returns
new_date_string – The adjusted date string in the format %Y-%m-%d.
- Return type
str
Examples
>>> subtract_days('2017-11-10', 31) # '2017-10-10'
-
alphapy.utilities.
valid_date
(date_string)¶ Determine whether or not the given string is a valid date.
- Parameters
date_string (str) – An alphanumeric string in the format %Y-%m-%d.
- Returns
date_string – The valid date string.
- Return type
str
- Raises
ValueError – Not a valid date.
Examples
>>> valid_date('2016-7-1') # datetime.datetime(2016, 7, 1, 0, 0) >>> valid_date('345') # ValueError: Not a valid date
-
alphapy.utilities.
valid_name
(name)¶ Determine whether or not the given string is a valid alphanumeric string.
- Parameters
name (str) – An alphanumeric identifier.
- Returns
result –
True
if the name is valid, elseFalse
.- Return type
bool
Examples
>>> valid_name('alpha') # True >>> valid_name('!alpha') # False
alphapy.variables module¶
-
class
alphapy.variables.
Variable
(name, expr, replace=False)¶ Bases:
object
Create a new variable as a key-value pair. All variables are stored in
Variable.variables
. Duplicate keys or values are not allowed, unless thereplace
parameter isTrue
.- Parameters
name (str) – Variable key.
expr (str) – Variable value.
replace (bool, optional) – Replace the current key-value pair if it already exists.
- Variables
variables (dict) – Class variable for storing all known variables
Examples
>>> Variable('rrunder', 'rr_3_20 <= 0.9') >>> Variable('hc', 'higher_close')
-
variables
= {}¶
-
alphapy.variables.
allvars
(expr)¶ Get the list of valid names in the expression.
- Parameters
expr (str) – A valid expression conforming to the Variable Definition Language.
- Returns
vlist – List of valid variable names.
- Return type
list
-
alphapy.variables.
vapply
(group, vname, vfuncs=None)¶ Apply a variable to multiple dataframes.
- Parameters
group (alphapy.Group) – The input group.
vname (str) – The variable to apply to the
group
.vfuncs (dict, optional) – Dictionary of external modules and functions.
- Returns
None
- Return type
None
- Other Parameters
Frame.frames (dict) – Global dictionary of dataframes
See also
-
alphapy.variables.
vexec
(f, v, vfuncs=None)¶ Add a variable to the given dataframe.
This is the core function for adding a variable to a dataframe. The default variable functions are already defined locally in
alphapy.transforms
; however, you may want to define your own variable functions. If so, then thevfuncs
parameter will contain the list of modules and functions to be imported and applied by thevexec
function.To write your own variable function, your function must have a pandas DataFrame as an input parameter and must return a pandas DataFrame with the new variable(s).
- Parameters
f (pandas.DataFrame) – Dataframe to contain the new variable.
v (str) – Variable to add to the dataframe.
vfuncs (dict, optional) – Dictionary of external modules and functions.
- Returns
f – Dataframe with the new variable.
- Return type
pandas.DataFrame
- Other Parameters
Variable.variables (dict) – Global dictionary of variables
-
alphapy.variables.
vmapply
(group, vs, vfuncs=None)¶ Apply multiple variables to multiple dataframes.
- Parameters
group (alphapy.Group) – The input group.
vs (list) – The list of variables to apply to the
group
.vfuncs (dict, optional) – Dictionary of external modules and functions.
- Returns
None
- Return type
None
See also
-
alphapy.variables.
vmunapply
(group, vs)¶ Remove a list of variables from multiple dataframes.
- Parameters
group (alphapy.Group) – The input group.
vs (list) – The list of variables to remove from the
group
.
- Returns
None
- Return type
None
See also
-
alphapy.variables.
vparse
(vname)¶ Parse a variable name into its respective components.
- Parameters
vname (str) – The name of the variable.
- Returns
vxlag (str) – Variable name without the
lag
component.root (str) – The base variable name without the parameters.
plist (list) – The parameter list.
lag (int) – The offset starting with the current value [0] and counting back, e.g., an offset [1] means the previous value of the variable.
Notes
AlphaPy makes feature creation easy. The syntax of a variable name maps to a function call:
xma_20_50 => xma(20, 50)
Examples
>>> vparse('xma_20_50[1]') # ('xma_20_50', 'xma', ['20', '50'], 1)
-
alphapy.variables.
vsub
(v, expr)¶ Substitute the variable parameters into the expression.
This function performs the parameter substitution when applying features to a dataframe. It is a mechanism for the user to override the default values in any given expression when defining a feature, instead of having to programmatically call a function with new values.
- Parameters
v (str) – Variable name.
expr (str) – The expression for substitution.
- Returns
The expression with the new, substituted values.
- Return type
newexpr
-
alphapy.variables.
vtree
(vname)¶ Get all of the antecedent variables.
Before applying a variable to a dataframe, we have to recursively get all of the child variables, beginning with the starting variable’s expression. Then, we have to extract the variables from all the subsequent expressions. This process continues until all antecedent variables are obtained.
- Parameters
vname (str) – A valid variable stored in
Variable.variables
.- Returns
all_variables – The variables that need to be applied before
vname
.- Return type
list
- Other Parameters
Variable.variables (dict) – Global dictionary of variables
-
alphapy.variables.
vunapply
(group, vname)¶ Remove a variable from multiple dataframes.
- Parameters
group (alphapy.Group) – The input group.
vname (str) – The variable to remove from the
group
.
- Returns
None
- Return type
None
- Other Parameters
Frame.frames (dict) – Global dictionary of dataframes
See also