Hi everyone,
Alteryx has been supporting the best of both worlds - code-free and code-friendly approaches to building your analytics and process automation for quite a while. Apart from the ability to design Alteryx workflows in a drag-and-drop, code free fashion, we allow users to utilize their code with R-code tool, Python-code tool, and also manage their predictive models with Alteryx Promote.
This of course sparks more and more questions from our customers about things like version control, git integration, managing code base, etc.
Here comes the list of some of the FAQs that I have been getting myself:
Before I take a look at these topics and try to do my best to share my thoughts around them, let me introduce several key concepts.
Version Control (VC)
Version control as a concept has been around for decades now. It has been designed to manage source code/ keep track of changes to your files.
All programmers nowadays will use some form of version control to manage their code base but also to collaborate with other team members.
What does VC allow you to do then?
GIT
Git has grown to become the de-facto standard for VCs. Honestly, most of the time I am using GIT and VC interchangeably (at least myself anyway).
Git comes in various flavours like GitLab (Google backed), GitHub (Microsoft owned) and many others.
If you want to start learning a little more about Git, I believe that Atlassian has really great tutorials to begin with.
Main benefits:
- Distributed type of VC - i.e. work offline, everyone has their own full copy and submits changes when needed
- Easy Integrations (issue tracking like JIRA, continuous integration like Jenkins, ....)
- Does not depend on IDE (Tool of choice for developers - like Visual Studio, PyCharm, IntelliJ, VIM)
- There are best practices but no two teams will use the same “workflow” with GIT
Bottom Line: For a developer VC is not just version control, It’s bread and butter. You, as a coder, are married to this thing. Period.
Alteryx & Visual Programming
You probably know that Alteryx Designer allows you to build visual workflow processes. This is almost like putting together a visual recipe of how you want to "cook" your data. Or, almost like taking Lego bricks and one piece at a time building that bad-**bleep** castle you have been planning with you kids for months now.
From a slightly different perspective, Designer is actually all about visual programming. The concept of visual programming (VPL) has actually been around for quite some time and I believe that everyone in Alteryx has been striving to bring this to a whole new level.
Alteryx & Workflow source code
Whenever you start building a workflow, all your "moves" in Alteryx Designer actually create XML code on the background.
Your YXMD (workflow) file is full of XML code. By all means you can view it or even parse it using Designer itself.
Even if you use some of the Python or R tools, this respective programming code will still be somewhere in the XML code of course.
Alteryx & Version Control
So how does Alteryx manages your workflows and achieves version control? The best practice to (not only) keep your workflows organised across your company is to rely on Alteryx Server. Among other things, Alteryx Server allows proper version control since version 10.0. Of course, as Alteryx is all about VPL i mentioned above, and version control must be easily accessible even to business users and not just developers, Alteryx uses its own proprietary way to manage your workflows (rather than relying on Git for instance) and its versioning by default. This way all your workflows are centralised, backed up and you can always redeploy previous version of your work.
note: Also, you don't ever need to call your workflow something like WORKFLOW_FINAL_1, WORKFLOW_FINAL_2 etc. See the pic below.
Besides this, you can also use a nice little trick to visually compare your workflows. Something coders would ask about quite frequently.
Alteryx & Git for workflows
As I noted in the previous paragraph, Alteryx Server utilizes its own proprietary version control mechanisms rather than relying on GIT.
Apart from Alteryx Server, can you use Git and commit & push your workflows to your code repo? Yes, by all means.
If your team is primarily technical and there is hunger for "proper" developer's VC and source code management tool then go for it.
Even internal developer teams in Alteryx will utilize Git for workflows we build, like Connect loaders to keep all history and utilize things like Jenkins for automated builds and integration by DevOps teams.
Should you just replace the Server version control with this? I would not think that would be a recommended best practice as you need business users / end users supported too. And Server has your back there.
Just keep in mind that every small change of your workflow, like moving a tool to a new location will change the XML source code so things like version diff become a little more tricky with GIT only. Best practice there would be to rely on macros and fragment large workflows into smaller bits and pieces so your XML code does not change that massively between edits.
The same goes for merging code which can be a little more tricky due to the way we manage metadata and dependencies in workflows.
Alteryx & Git for Python/ R tools
You can actually go beyond managing just workflow XML files in your GIT for Alteryx. What if we need to do this the other way around?
This means utilising your existing code repository in GIT with your R scripts or Python code to push this to Alteryx workflow code tools.
Sure - I actually wrote a short article about that a few weeks back. Python Code Tool Script Runner Macro (Code Injection).
With this bit, you can easily keep all your Python code in GIT repo and "inject" this code into the Python code tool of your worfklow.
I have had the same thing planned for the R tool and should be pretty straightforward too. Anyone wanting to beat me to it? @ShaanM?
Alteryx Designer and Code Takeaways
Promote & Code & Versioning
Alteryx Promote definitely has your back when it comes to managing & deploying your predictive models. This will, in reality, mean managing quite a bit of R or Python code of course. Most of the time anyway. Note: Alteryx predictive models can be deployed too of course!
Let me talk shortly about versioning and GIT integration here regarding Promote:
Promote & “Code Fetch” with CRAN/ PyPI
Hope you will find this article useful! Let me know if you have any questions or want to ask/share some best practices I forgot about.
Regs,
dm
Epic post @DavidM !
This topic comes up all the time and it's great to have a comprehensive write-up that I can direct people to in future. Thanks!
Thank you for this very informative article @DavidM !
I'd like you get your advice on one thing - So the data scientist in my team has his python scripts in git. I was wondering is it possible to call those remote scripts from Alteryx designer using the a url link or python tool etc?
Thanks in advance!
Lisa
Hi @_Lisa_ ,
With GIT i believe that the common practice is to GIT PULL first to the local machine and work with a 1:1 copy of your code repo on whichever machine you need that Python run from.
I can imagine though that if you prefer to keep your code in the cloud git repo only, there would typically be an API available for that GIT service that can be used to inject code into the Python Code tool.
I for instance had similar discussion with a customer around BITBUCKET and it supports exactly that
https://developer.atlassian.com/server/bitbucket/how-tos/command-line-rest/
Python injection may not happen just with files (ie. after GIT PULL to your machine) but I would think you could replicate that approach mentioned in article below to actually load code into the Git Code tool with REST api calls to your GIT repo, if that makes sense.
Hope this helps.
dm
Insightful artcile! Thank you so much for sharing. This is really helpful!
Excellent write up! It is a pìtty that git and alteryx arent tied together more. I understand the visual programming "constraint" though.
Still, on the desktop, it would be great to be able to roll back to a previous commit, or have development branches to try something new, without messing up your workflows. That kind of tie-in would be marvelous.
Thanks, David. One question though. My company stores Macros on a drive mounted to the servers. We thought that was an acceptable practice. Since these macros are not stored on or in the Alteryx server then don't you need to have some form of version control in addition to what Alteryx provides? Is it better to load all macros to the server for that reason?
Thanks for the article @DavidM
We are in the process in setting up a data platform with Alteryx and GIT and have been struggling to find a good solution. The main issue we face is that we want to use a Dev and Prod environment and everytime we update a workflow in Dev and re-submit to GIT we can't just copy that workflow into the Prod GIT folder and overwrite the existing verison on the Gallery (ie Save as..and overwrite). It seems that you can only overwrite anything on the Gallery if you open the respective workflow from the Gallery which is a real pain if you want to have a Dev and Prod environment. Our solution is to delete/re-upload a workflow once it changed which unfortunately means we can't use the Gallery version control.
If you have any suggestions to do that better, would be greatly appreciated!