Showcase your achievements in the Maveryx Community by submitting a Success Story now!
SUBMISSION INSTRUCTIONS
1. Extract Data:
2. Data Prep:
3. Make Predictions:
4. Execute Trades:
XGBoost:
Metabase:
Hello Ries9112,
this is an awesome idea and great work. I have been trading currencies on and off in the past couple of years. I am so very much interested in this program.
I am wondering if it can be done for other non-crypt currencies.
is there any way to get the actual Alteryx process? how can I get more info?
thank you,
Hi @koheny01, thanks for stopping by!
In terms of applying this process to trading on the forex markets, you wouldn't be able to re-purpose anything from the workflows themselves, and the trading step happens through a platform called Shrimpy which only supports cryptocurrency exchanges, but the general process would be the same so I am going to outline that instead:
library(tidyverse) # data manipulation
library(mlr) # ML package (also some data manipulation)
library(xgboost)
train <- read.Alteryx("#1", mode = "data.frame")
test <- read.Alteryx("#2", mode = "data.frame")
# Fitting XGBoost
trainTask <- makeClassifTask(data = train, target = "Target6hChange_Binary", positive = 1)
testTask <- makeClassifTask(data = test, target = "Target6hChange_Binary")
set.seed(1)
# Create an xgboost learner that is classification based and outputs labels (as opposed to probabilities)
xgb_learner <- makeLearner(
"classif.xgboost",
predict.type = "prob",
par.vals = list(
objective = "binary:logistic",
eval_metric = "auc",
nrounds = 300,
early_stopping_rounds = 30
)
)
# Create a model
xgb_model <- train(xgb_learner, task = trainTask)
result <- predict(xgb_model, testTask)
head(result$data)
# Hyper-parameter Tuning
# Full list of parameters and explanations: https://xgboost.readthedocs.io/en/latest/parameter.html
# Could also run the following code to get a list with value ranges: getParamSet("classif.xgboost")
xgb_params <- makeParamSet(
# The number of trees in the model (each one built sequentially)
# makeIntegerParam("nrounds", lower = 50, upper = 500),
# number of splits in each tree
makeIntegerParam("max_depth", lower = 1, upper = 10),
# "shrinkage" - prevents overfitting
makeNumericParam("eta", lower = .1, upper = .5),
# L2 regularization - prevents overfitting
makeNumericParam("lambda", lower = -1, upper = 0, trafo = function(x) 10^x),
# Type of booster
makeDiscreteParam("booster", values = c("gbtree", "gblinear", "dart")),
# Additional parameters to test
makeNumericParam("min_child_weight", lower = 1L, upper = 10L),
makeNumericParam("subsample", lower = 0.5, upper = 1),
makeNumericParam("colsample_bytree", lower = 0.5, upper = 1),
# Test error type
makeDiscreteParam("eval_metric", values = c("auc", "rmse", "logloss"))
)
# Set Random or grid control. Grid control does 100% of options and is crazy computationally expensive,
# but could set less parameter options and do that instead. Change "Random" to "Grid" and pass no parameters
# and should work the same otherwise
control <- makeTuneControlRandom(maxit = 30)
# Create a description of the resampling plan
resample_desc <- makeResampleDesc("CV", iters = 5)
# Perform tuning
tuned_params <- tuneParams(
learner = xgb_learner,
task = trainTask,
resampling = resample_desc,
par.set = xgb_params,
control = control
)
# Create a new model using tuned hyperparameters
xgb_tuned_learner <- setHyperPars(
learner = xgb_learner,
par.vals = tuned_params$x
)
# Re-train parameters using tuned hyperparameters (and full training set)
xgb_model <- train(xgb_tuned_learner, trainTask)
# Make a new prediction
result <- predict(xgb_model, testTask)
prediction <- result$data %>%
select(pkey = id, Target6hChange_Binary = response) %>%
# No sorting has happened, so everything still matches up.
mutate(pkey = test$pkey)
# Output data to Alteryx
write.Alteryx(as.data.frame(prediction), 1)
#Save xgb model
save.image(file="C:\\Users\\Ricky\\Desktop\\CryptoWork2019\\November\\XGBoostModelsByExchange\\KuCoinXGBoostModel.RData")
This code was adapted from this excellent tutorial: https://rpubs.com/ippromek/336732
library(tidyverse) # data manipulation
library(mlr) # ML package (also some data manipulation)
library(xgboost)
test <- read.Alteryx("#1", mode = "data.frame")
# Load XGBoost model which is refreshed once a day
load("C:\\Users\\Ricky\\Desktop\\CryptoWork2019\\November\\XGBoostModelsByExchange\\KuCoinXGBoostModel.RData")
# Create test task
testTask <- makeClassifTask(data = test, target = "Target6hChange_Binary")
# Make a new prediction
result <- predict(xgb_model, testTask)
prediction <- result$data %>%
select(pkey = id, Target6hChange_Binary = response) %>%
# Put back the original passenger IDs. No sorting has happened, so
# everything still matches up.
mutate(pkey = test$pkey)
# Print preview of predictions
print(head(prediction))
# Output data to Alteryx
write.Alteryx(as.data.frame(prediction), 1)
I hope this helps and makes sense, I shared what I felt like was useful and relevant to your question and I'm not sharing the workflow itself for now because I honestly don't think it would be very useful as nothing would run with all the database connections breaking, but if there is a demand for it I will give it some thought and try to make some form of template available. Let me know if you are still looking for more info, but I should definitely mention that it took a lot of time and effort to setup the infrastructure to get the data to flow into the database used for this post in close to live time, and although there have been some positive results from the predictive modeling itself, it has not yet translated into a consistently positive trading strategy, so I would encourage you to do a project like this to enhance your learning rather than assuming it will actually end up working.
Let me know if you have any questions!
Ricky
Hello Ricky,
thank you so very much for your great detail response.
I am relatively new with Alteryx. it will take a while for me to digest what you wrote me. it's above my paycheck !! 🙂
However, I will certainly look into this. my broker is OANDA. As much as I know, they have a great databases and I need to discover what is suitable and what is not.
I will be in touch with you if I have any update.
All the best,
This is an amazing and uniquely creative post, Riccardo @ries9112 !
My colleague has actually written a Data Science blogpost regarding building an XGBoost macro (both in R & Python). Do check it out here: https://community.alteryx.com/t5/Data-Science-Blog/Expand-Your-Predictive-Palette-XGBoost-in-Alteryx...
Best,
Michael Utama
Associate Solutions Engineer
Alteryx APAC
Thank you @mutama! That's a seriously awesome post and an even better macro! I actually stumbled upon it in the past and I've used the R one (could never quite figure out the installation of things on the Python side of things at the time) and I definitely should have mentioned that macro is available in my use case. I didn't use it because I am quite good in R and had experience using XGBoost models, but that's a no-brainer for others who don't have that comfort level and I should have thought to make that adjustment. I just made a note to myself to include that within the use case; I plan on doing some more additional work on the section describing the benefits achieved, and once I reach out to the moderator of the post to make those changes I will be sure to make this adjustment as well and include the excellent Data Science blogpost.
Thank you for reaching out about that!
Ricky
Awesome workflow! I was trying to zoom in on some of the pictures you posted but can see it that well. Is it possible to post the workflow file? I realize I don't have some of the same macros installed but I'd still like to see the process. Thank you!
Hey! Can you share the workflow please?