Alteryx has been supporting the best of both worlds - code-free and code-friendly approaches to building your analytics and process automation for quite a while. Apart from the ability to design Alteryx workflows in a drag-and-drop, code free fashion, we allow users to utilize their code with R-code tool, Python-code tool, and also manage their predictive models with Alteryx Promote.
This of course sparks more and more questions from our customers about things like version control, git integration, managing code base, etc.
Here comes the list of some of the FAQs that I have been getting myself:
- Does Alteryx support version control (VC)?
- What is the best practices for workflows & VC, but also code & VC?
- Can we utilize our existing Git repository with Alteryx?
- How to best manage Python or R code in Alteryx?
- Any tips and tricks to effectively move code to our workflows?
- How to manage source code of models for Alteryx Promote?
- How to integrate our existing code repo with Alteryx Promote?
- and many more...
Before I take a look at these topics and try to do my best to share my thoughts around them, let me introduce several key concepts.
Version Control (VC)
Version control as a concept has been around for decades now. It has been designed to manage source code/ keep track of changes to your files.
All programmers nowadays will use some form of version control to manage their code base but also to collaborate with other team members.
What does VC allow you to do then?
- keep the entire history of a file and inspect a file throughout its life time
- tag particular version so you can go back to them easily
- collaborate in teams and makes contributions transparent
- experiment with code and feature without breaking the main project (branch/ merge)
Git has grown to become the de-facto standard for VCs. Honestly, most of the time I am using GIT and VC interchangeably (at least myself anyway).
Git comes in various flavours like GitLab (Google backed), GitHub (Microsoft owned) and many others.
If you want to start learning a little more about Git, I believe that Atlassian has really great tutorials to begin with.
- Distributed type of VC - i.e. work offline, everyone has their own full copy and submits changes when needed
- Easy Integrations (issue tracking like JIRA, continuous integration like Jenkins, ....)
- Does not depend on IDE (Tool of choice for developers - like Visual Studio, PyCharm, IntelliJ, VIM)
- There are best practices but no two teams will use the same “workflow” with GIT
Bottom Line: For a developer VC is not just version control, It’s bread and butter. You, as a coder, are married to this thing. Period.
Alteryx & Visual Programming
You probably know that Alteryx Designer allows you to build visual workflow processes. This is almost like putting together a visual recipe of how you want to "cook" your data. Or, almost like taking Lego bricks and one piece at a time building that bad-**bleep** castle you have been planning with you kids for months now.
From a slightly different perspective, Designer is actually all about visual programming. The concept of visual programming (VPL) has actually been around for quite some time and I believe that everyone in Alteryx has been striving to bring this to a whole new level.
Alteryx & Workflow source code
Whenever you start building a workflow, all your "moves" in Alteryx Designer actually create XML code on the background.
Your YXMD (workflow) file is full of XML code. By all means you can view it or even parse it using Designer itself.
Even if you use some of the Python or R tools, this respective programming code will still be somewhere in the XML code of course.
Alteryx & Version Control
So how does Alteryx manages your workflows and achieves version control? The best practice to (not only) keep your workflows organised across your company is to rely on Alteryx Server. Among other things, Alteryx Server allows proper version control since version 10.0. Of course, as Alteryx is all about VPL i mentioned above, and version control must be easily accessible even to business users and not just developers, Alteryx uses its own proprietary way to manage your workflows (rather than relying on Git for instance) and its versioning by default. This way all your workflows are centralised, backed up and you can always redeploy previous version of your work.
note: Also, you don't ever need to call your workflow something like WORKFLOW_FINAL_1, WORKFLOW_FINAL_2 etc. See the pic below.
Besides this, you can also use a nice little trick to visually compare your workflows. Something coders would ask about quite frequently.
Alteryx & Git for workflows
As I noted in the previous paragraph, Alteryx Server utilizes its own proprietary version control mechanisms rather than relying on GIT.
Apart from Alteryx Server, can you use Git and commit & push your workflows to your code repo? Yes, by all means.
If your team is primarily technical and there is hunger for "proper" developer's VC and source code management tool then go for it.
Even internal developer teams in Alteryx will utilize Git for workflows we build, like Connect loaders to keep all history and utilize things like Jenkins for automated builds and integration by DevOps teams.
Should you just replace the Server version control with this? I would not think that would be a recommended best practice as you need business users / end users supported too. And Server has your back there.
Just keep in mind that every small change of your workflow, like moving a tool to a new location will change the XML source code so things like version diff become a little more tricky with GIT only. Best practice there would be to rely on macros and fragment large workflows into smaller bits and pieces so your XML code does not change that massively between edits.
The same goes for merging code which can be a little more tricky due to the way we manage metadata and dependencies in workflows.
Alteryx & Git for Python/ R tools
You can actually go beyond managing just workflow XML files in your GIT for Alteryx. What if we need to do this the other way around?
This means utilising your existing code repository in GIT with your R scripts or Python code to push this to Alteryx workflow code tools.
Sure - I actually wrote a short article about that a few weeks back. Python Code Tool Script Runner Macro (Code Injection).
With this bit, you can easily keep all your Python code in GIT repo and "inject" this code into the Python code tool of your worfklow.
I have had the same thing planned for the R tool and should be pretty straightforward too. Anyone wanting to beat me to it? @ShaanM?
Alteryx Designer and Code Takeaways
- Designer uses concepts of visual programming
- Workflows can be visually compared to one another
- Workflows are versioned on Alteryx Server using proprietary versioning (and this is the best practice that supports all types of users)
- Alteryx does not rely on GIT big times and don’t use it natively, but can utilise parts of it if you want
- Alteryx workflows are XML code -> you can use GIT to manage that XML code
- You can use GIT with your existing Python or R scripts. Alteryx Python Tool allows you to dynamically inject code from a flat file location (from local GIT repo) with simple script - see details above
Promote & Code & Versioning
Alteryx Promote definitely has your back when it comes to managing & deploying your predictive models. This will, in reality, mean managing quite a bit of R or Python code of course. Most of the time anyway. Note: Alteryx predictive models can be deployed too of course!
Let me talk shortly about versioning and GIT integration here regarding Promote:
- Deployed Promote models are versioned and accessible on Promote web interface (can redeploy and see the history for auditing and regulatory purposes)
- Source code of models is version as part of the docker image but not easily retrievable (not through the web-based UI anyway) - developers/ coders should keep using their standard VC/ Git as best practice (include Promote model version in commit messages etc)
- Python models must have certain structure
- Most Data scientists/ developers will most likely avoid Designer and publish from IDE of their choice directly
- Sample code from Python/ R is accessible on GitHub through
- Promote fetches packages from CRAN (R) or PYPI (Python’s PIP) during model build
- Promote can be set up to retrieve code from your own GitHub repo during model build but also to use the Ubuntu command with sh script to install the env
Promote & “Code Fetch” with CRAN/ PyPI
- Docker (Swarm) creates a container from the R or Python base image (its a little more complicated but lets stick to this for now)
- For R models, the required packages specified in promote.library() function of your model from the CRAN repository and installs it to the container. note: CRAN is the main repo of R packages (15k), this is where your code goes when you do install.packages()
- For Python models you provide the libraries you want to install in a requirements.txt file alongside of your model's source code. These need to be accessible through PyPI note: PyPI (Python Package Index) is the main repo of Python packages (200k), installed mainly using pip install
- Promote can also integrate code snippets from Github
- A Github link with a token to the Github account needs to be presented in Python packages.
- Last way is to use promote.sh script and provide the libraries you want to install as a Ubuntu command within the shell script for either R or Python
Hope you will find this article useful! Let me know if you have any questions or want to ask/share some best practices I forgot about.