This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Alteryx has been supporting the best of both worlds - code-free and code-friendly approaches to building your analytics and process automation for quite a while. Apart from the ability to design Alteryx workflows in a drag-and-drop, code free fashion, we allow users to utilize their code with R-code tool, Python-code tool, and also manage their predictive models with Alteryx Promote.
This of course sparks more and more questions from our customers about things like version control, git integration, managing code base, etc.
Here comes the list of some of the FAQs that I have been getting myself:
Does Alteryx support version control (VC)?
What is the best practices for workflows & VC, but also code & VC?
Can we utilize our existing Git repository with Alteryx?
How to best manage Python or R code in Alteryx?
Any tips and tricks to effectively move code to our workflows?
How to managesource code of models for Alteryx Promote?
How to integrate our existing code repo with Alteryx Promote?
and many more...
Before I take a look at these topics and try to do my best to share my thoughts around them, let me introduce several key concepts.
Version Control (VC)
Version control as a concept has been around for decades now. It has been designed to manage source code/ keep track of changes to your files.
All programmers nowadays will use some form of version control to manage their code base but also to collaboratewith other team members.
What does VC allow you to do then?
keep the entire history of a file and inspect a file throughout its life time
tag particular version so you can go back to them easily
collaborate in teams and makes contributions transparent
experiment with code and feature without breaking the main project (branch/ merge)
Git has grown to become the de-facto standard for VCs. Honestly, most of the time I am using GIT and VC interchangeably (at least myself anyway).
Git comes in various flavours like GitLab (Google backed), GitHub (Microsoft owned) and many others.
If you want to start learning a little more about Git, I believe that Atlassian has really great tutorials to begin with.
- Distributed type of VC - i.e. work offline, everyone has their own full copy and submits changes when needed
- Easy Integrations (issue tracking like JIRA, continuous integration like Jenkins, ....)
- Does not depend on IDE (Tool of choice for developers - like Visual Studio, PyCharm, IntelliJ, VIM)
- There are best practices but no two teams will use the same “workflow” with GIT
Bottom Line: For a developer VC is not just version control, It’s bread and butter. You, as a coder, are married to this thing. Period.
Alteryx & Visual Programming
You probably know that Alteryx Designer allows you to build visual workflow processes. This is almost like putting together a visual recipe of how you want to "cook" your data. Or, almost like taking Lego bricks and one piece at a time building that bad-**bleep** castle you have been planning with you kids for months now.
From a slightly different perspective, Designer is actually all about visual programming. The concept of visual programming (VPL) has actually been around for quite some time and I believe that everyone in Alteryx has been striving to bring this to a whole new level.
Alteryx & Workflow source code
Whenever you start building a workflow, all your "moves" in Alteryx Designer actually create XML code on the background.
Your YXMD (workflow) file is full of XML code. By all means you can view it or even parse itusing Designer itself.
Even if you use some of the Python or R tools, this respective programming code will still be somewhere in the XML code of course.
Alteryx & Version Control
So how does Alteryx manages your workflows and achieves version control? The best practice to (not only) keep your workflows organised across your company is to rely on Alteryx Server. Among other things, Alteryx Server allows proper version controlsince version 10.0. Of course, as Alteryx is all about VPL i mentioned above, and version control must be easily accessible even to business users and not just developers, Alteryx uses its own proprietary way to manage your workflows (rather than relying on Git for instance) and its versioning by default. This way all your workflows are centralised, backed up and you can always redeploy previous version of your work.
note: Also, you don't ever need to call your workflow something like WORKFLOW_FINAL_1, WORKFLOW_FINAL_2 etc. See the pic below.
As I noted in the previous paragraph, Alteryx Server utilizes its own proprietary version control mechanisms rather than relying on GIT.
Apart from Alteryx Server, can you use Git and commit & push your workflows to your code repo? Yes, by all means.
If your team is primarily technical and there is hunger for "proper" developer's VC and source code management tool then go for it.
Even internal developer teams in Alteryx will utilize Git for workflows we build, like Connect loaders to keep all history and utilize things like Jenkins for automated builds and integration by DevOps teams.
Should you just replace the Server version control with this? I would not think that would be a recommended best practice as you need business users / end users supported too. And Server has your back there.
Just keep in mind that every small change of your workflow, like moving a tool to a new location will change the XML source code so things like version diff become a little more tricky with GIT only. Best practice there would be to rely on macros and fragment large workflows into smaller bits and pieces so your XML code does not change that massively between edits.
The same goes for merging code which can be a little more tricky due to the way we manage metadata and dependencies in workflows.
Alteryx & Git for Python/ R tools
You can actually go beyond managing just workflow XML files in your GIT for Alteryx. What if we need to do this the other way around?
This means utilising your existing code repository in GIT with your R scripts or Python code to push this to Alteryx workflow code tools.
With this bit, you can easily keep all your Python code in GIT repo and "inject" this code into the Python code tool of your worfklow.
I have had the same thing planned for the R tool and should be pretty straightforward too. Anyone wanting to beat me to it? @ShaanM?
Alteryx Designer and Code Takeaways
Designer uses concepts of visual programming
Workflows can be visually compared to one another
Workflows are versioned on Alteryx Server using proprietary versioning (and this is the best practice that supports all types of users)
Alteryx does not rely on GIT big times and don’t use it natively, but can utilise parts of it if you want
Alteryx workflows are XML code -> you can use GIT to manage that XML code
You can use GIT with your existing Python or R scripts. Alteryx Python Tool allows you to dynamically inject code from a flat file location (from local GIT repo) with simple script - see details above
Promote & Code & Versioning
Alteryx Promote definitely has your back when it comes to managing & deploying your predictive models. This will, in reality, mean managing quite a bit of R or Python code of course. Most of the time anyway. Note: Alteryx predictive models can be deployed too of course!
Let me talk shortly about versioning and GIT integration here regarding Promote:
Deployed Promote models are versioned and accessible on Promote web interface (can redeploy and see the history for auditing and regulatory purposes)
Source code of models is version as part of the docker image but not easily retrievable (not through the web-based UI anyway) - developers/ coders should keep using their standard VC/ Git as best practice (include Promote model version in commit messages etc)
Most Data scientists/ developers will most likely avoid Designer and publish from IDE of their choice directly
Sample code from Python/ R is accessible on GitHub through
Promote fetches packages from CRAN (R) or PYPI (Python’s PIP) during model build
Promote can be set up to retrieve code from your own GitHub repo during model build but also to use the Ubuntu command with sh script to install the env
Promote & “Code Fetch” with CRAN/ PyPI
Docker (Swarm) creates a container from the R or Python base image (its a little more complicated but lets stick to this for now)
For R models, the required packages specified in promote.library() function of your model from the CRAN repository and installs it to the container. note: CRAN is the main repo of R packages (15k), this is where your code goes when you do install.packages()
For Python models you provide the libraries you want to install in a requirements.txt file alongside of your model's source code. These need to be accessible through PyPInote: PyPI (Python Package Index) is the main repo of Python packages (200k), installed mainly using pip install
Promote can also integrate code snippets from Github
A Github link with a token to the Github account needs to be presented in Python packages.
Last way is to use promote.sh script and provide the libraries you want to install as a Ubuntu command within the shell script for either R or Python
Hope you will find this article useful! Let me know if you have any questions or want to ask/share some best practices I forgot about.
Thank you for this very informative article @DavidM !
I'd like you get your advice on one thing - So the data scientist in my team has his python scripts in git. I was wondering is it possible to call those remote scripts from Alteryx designer using the a url link or python tool etc?
With GIT i believe that the common practice is to GIT PULL first to the local machine and work with a 1:1 copy of your code repo on whichever machine you need that Python run from.
I can imagine though that if you prefer to keep your code in the cloud git repo only, there would typically be an API available for that GIT service that can be used to inject code into the Python Code tool.
I for instance had similar discussion with a customer around BITBUCKET and it supports exactly that
Python injection may not happen just with files (ie. after GIT PULL to your machine) but I would think you could replicate that approach mentioned in article below to actually load code into the Git Code tool with REST api calls to your GIT repo, if that makes sense.