AlphaPy is a machine learning framework for both speculators and
data scientists. It is written in Python with the
pandas libraries, as well as many other helpful libraries
for feature engineering and visualization. Here are just some of the
things you can do with AlphaPy:
Run machine learning models using
Create models for analyzing the markets with MarketFlow.
Predict sporting events with SportFlow.
Develop trading systems and analyze portfolios using MarketFlow and Quantopian’s
alphapy package is the base platform. The domain pipelines
mflow) and SportFlow (
sflow) run on top of
alphapy. As shown in the diagram below, we separate the domain
pipeline from the model pipeline. The main job of a domain pipeline
is to transform the raw application data into canonical form, i.e.,
a training set and a testing set. The model pipeline is flexible
enough to handle any project and evolved over many Kaggle
Let’s review all of the components in the diagram:
This is the Python code that creates the standard training and testing data. For example, you may be combining different data frames or collecting time series data from an external feed. These data are transformed for input into the model pipeline.
AlphaPy uses configuration files written in YAML to give the data scientist maximum flexibility. Typically, you will have a standard YAML template for each domain or application.
The training data is an external file that is read as a pandas dataframe. For classification, one of the columns will represent the target or dependent variable.
The testing data is an external file that is read as a pandas dataframe. For classification, the labels may or may not be included.
This Python code is generic for running all classification or regression models. The pipeline begins with data and ends with a model object for new predictions.
The configuration file has specific sections for running the model pipeline. Every aspect of creating a model is controlled through this file.
All models are saved to disk. You can load and run your trained model on new data in scoring mode.
AlphaPy has been developed primarily for supervised learning tasks. You can generate models for any classification or regression problem.
Binary Classification: classify elements into one of two groups
Multiclass Classification: classify elements into multiple categories
Regression: predict real values based on derived coefficients
Support Vector Machine (including Linear)
Naive Bayes (including Multinomial)
Radial Basis Functions
XGBoost Binary and Multiclass
AlphaPy relies on a number of key packages in both its model and domain pipelines. Although most packages are included in the Anaconda Python platform, most of the following packages are not, so please refer to the Web or Github site for further information.