Overview of Use Case
Predict Crypto is a small project, created by a student at the University of Colorado Boulder, that aims to predict, trade and do research on the cryptocurrency markets. The project collects data from a broad range of markets and different data sources, and uses that information to perform predictive modeling and initiate trades. Based on the outcomes of the predictions made, the trading strategy changes to distribute the capital across the assets with the most favorable outlook, and if the outlook for the market as a whole does not look favorable, the funds are invested in the cryptocurrency "USDT", which has a value that always tracks equivalent to the US Dollar, in order to mitigate losses associated with the natural volatility of the cryptocurrency markets.
Describe the business challenge or problem you needed to solve
We were looking to create a fully automated solution with the capability of dynamically pulling the latest data from a database and produce a brand new set of predictive models on a daily basis, and leverage those models to create new predictions every hour and execute real trades on the live cryptocurrency markets. When it comes to navigating the incredibly volatile world of cryptocurrencies, the most encouraged strategy is to "HODL" (Hold On for Dear Life), or in other words to put the investment away and forget about it for a couple of years. During those years there are sure to be several crazy upward and downward price surges and not all of them are going to be easy to predict, but perhaps there are ways for us to observe patterns within the markets to make medium-term predictions and cash in on those trends better than a simple HODL strategy would perform, so that is the objective of the project.
Describe your working solution
The process in Alteryx from beginning to end is outlined in the graphic below:
1. Extract Data:
- Our database is hosted on Google Cloud Platform and is a MySQL database
- For the predictive modeling, we will need to pull a "train" dataset, which will consist of all the data we collected relating to the cryptocurrency markets in the past, and we will limit the data to be from the past 100 days in order to keep the dataset to a reasonable size:
- When we intend to leverage the models after they are created in the alter steps, we will adjust this query to pull data from the latest hour instead of the latest 100 days, using a field created specifically for this reason called "pkDummy":
2. Data Prep:
- Data is extracted from several different sources within the database using the methodology outlined in step 1, and some data manipulation is performed to join all the data together and prepare the dataset for the predictive modeling step:
- It's not worth taking too deep of a dive into the specifics of the data preparation, but it mainly consists of calculating the target variable, in this case the % change in USD price over the next 6 hour period for individual cryptocurrencies, and joining data across different data sources matching both on when the data was collected and for what asset.
3. Make Predictions:
- Because the trades are executed on multiple exchanges, we create individual models for each exchange. Each exchange has an independent model created using the XGBoost framework programmed in R using the R developer tool. Additional models using the R Predictive tools found within Alteryx are also created:
- The XGBoost model itself is outside the scope of this tutorial, but see the Related Resources section at the end of this article for an excellent practical example of how to create an XGBoost model with automated hyperparameter tuning in R.
4. Execute Trades:
- Using the models created in step 3, predictions on the latest/live data are made. Before actually performing the trades, the predictions get finalized applying an 80% weight to the predictions made by the XGBoost models, and 20% weight to the predictions made by the Alteryx models, adjusting them as necessary based on observed performance. After making the necessary adjustments to the predicted values, the data is passed to a Python tool, which leverages the Shrimpy developer API to execute trades on each of the exchanges individually:
- We use the Shrimpy platform to execute our trades because it can be challenging to connect to the individual exchanges through different APIs, and the Shrimpy.io service allows us to connect to several different exchanges using the same API and methodology. There are also additional benefits to using the service, such as "rebalancing", and allowing other users to automatically follow the same portfolio as the one doing the automated trading. Go to the end of the article to the related resources section for more information about the very awesome Shrimpy.io service/platform.
- Once we repeat the process for several exchanges independently, this is what the workflow that performs trades on the latest data ends up looking like:
The yellow container has e-mail alerts that get triggered if the data is not flowing through as expected
5. Compare Model Performance:
- Once the predictions are made, that information is written to a database table as a centralized location to then be able to reference to compare the predicted values vs. actuals:
- In this step we create new fields that summarize the DateTime of when the prediction end is supposed to be. For example, if a prediction was made at 3:00PM on November 23rd 2019, we would be predicting the % price change against the dollar from 3:00PM to 9:00PM.
- After writing the predictions on November 23rd 2019 at 3:00PM (just an example), once 6 hours have passed, we can pickup the data using the Prediction End Date Time field and compare those predictions to the real life results and write the data to a table in the database for further analysis:
- Once that process is complete, the performance vs. actuals are visualized as interactive dashboards for internal analysis using an instance of Metabase running on AWS that works very similarly to how a Tableau Online site would work:
Describe the benefits you have achieved
Alteryx provides the best framework for this project by all accounts, and the process would be incredibly challenging to maintain through R or Python alone. Because of the huge time savings from a development perspective that are achieved through the usage of Alteryx, we are able to create a wide variety of tests and organize them and enable/disable different ones incredibly easily. Another thing that can be challenging using traditional tools is passing data from R to Python, and although the same results can be achieved using R Studio and the Reticulate package, being able to connect the data stream between tools in Alteryx provides a more powerful solution. Having the project built out of Alteryx means there are no limitations to our data capabilities as the project evolves and creates a huge amount of possibilities in terms of what can be achieved in a day compared to what could be done programming in R or Python.
We are working on a web page that will provide a hands-on programming tutorial as well as a more in-depth explanation and will add that here once that has been finalized. The company website for the project is https://www.predictcrypto.com/
Some of our favorites articles from the Shrimpy blog: