Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Alteryx vs Version Control, Code Repo/Git, R and Python code, Best Practices, Promote Code

DavidM
Alteryx
Alteryx

Hi everyone,

 

Alteryx has been supporting the best of both worlds - code-free and code-friendly approaches to building your analytics and process automation for quite a whileApart from the ability to design Alteryx workflows in a drag-and-drop, code free fashion, we allow users to utilize their code with R-code tool, Python-code tool, and also manage their predictive models with Alteryx Promote.

This of course sparks more and more questions from our customers about things like version control, git integration, managing code base, etc.

 

Here comes the list of some of the FAQs that I have been getting myself:

  • Does Alteryx support version control (VC)?
  • What is the best practices for workflows & VC, but also code & VC?
  • Can we utilize our existing Git repository with Alteryx?
  • How to best manage Python or R code in Alteryx?
  • Any tips and tricks to effectively move code to our workflows?
  • How to manage source code of models for Alteryx Promote?
  • How to integrate our existing code repo with Alteryx Promote?
  • and many more...

Before I take a look at these topics and try to do my best to share my thoughts around them, let me introduce several key concepts.

 

Version Control (VC)

Version control as a concept has been around for decades now. It has been designed to manage source code/ keep track of changes to your files.

All programmers nowadays will use some form of version control to manage their code base but also to collaborate with other team members.

What does VC allow you to do then?

  • keep the entire history of a file and inspect a file throughout its life time
  • tag particular version so you can go back to them easily
  • collaborate in teams and makes contributions transparent
  • experiment with code and feature without breaking the main project (branch/ merge)

image.png

GIT

Git has grown to become the de-facto standard for VCs. Honestly, most of the time I am using GIT and VC interchangeably (at least myself anyway).

Git comes in various flavours like GitLab (Google backed), GitHub (Microsoft owned) and many others.

If you want to start learning a little more about Git, I believe that Atlassian has really great tutorials to begin with.

Main benefits:

      - Distributed type of VC - i.e. work offline, everyone has their own full copy and submits changes when needed

      - Easy Integrations (issue tracking like JIRA, continuous integration like Jenkins, ....)

      - Does not depend on IDE (Tool of choice for developers - like Visual Studio, PyCharm, IntelliJ, VIM)

      - There are best practices but no two teams will use the same “workflow” with GIT

Bottom Line: For a developer VC is not just version control, It’s bread and butter. You, as a coder, are married to this thing. Period.

 

image.png

Alteryx & Visual Programming

You probably know that Alteryx Designer allows you to build visual workflow processes. This is almost like putting together a visual recipe of how you want to "cook" your data. Or, almost like taking Lego bricks and one piece at a time building that bad-**bleep** castle you have been planning with you kids for months now.

From a slightly different perspective, Designer is actually all about visual programming. The concept of visual programming (VPL) has actually been around for quite some time and I believe that everyone in Alteryx has been striving to bring this to a whole new level.

 

Alteryx & Workflow source code

Whenever you start building a workflow, all your "moves" in Alteryx Designer actually create XML code on the background.

Your YXMD (workflow) file is full of XML code. By all means you can view it or even parse it using Designer itself.

Even if you use some of the Python or R tools, this respective programming code will still be somewhere in the XML code of course.

 

Alteryx & Version Control

So how does Alteryx manages your workflows and achieves version control? The best practice to (not only) keep your workflows organised across your company is to rely on Alteryx Server. Among other things, Alteryx Server allows proper version control since version 10.0. Of course, as Alteryx is all about VPL i mentioned above, and version control must be easily accessible even to business users and not just developers, Alteryx uses its own proprietary way to manage your workflows (rather than relying on Git for instance) and its versioning by default. This way all your workflows are centralised, backed up and you can always redeploy previous version of your work.

note: Also, you don't ever need to call your workflow something like WORKFLOW_FINAL_1, WORKFLOW_FINAL_2 etc. See the pic below.

Besides this, you can also use a nice little trick to visually compare your workflows. Something coders would ask about quite frequently.

 

image.png

Alteryx & Git for workflows

As I noted in the previous paragraph, Alteryx Server utilizes its own proprietary version control mechanisms rather than relying on GIT.

Apart from Alteryx Server, can you use Git and commit & push your workflows to your code repo? Yes, by all means.

If your team is primarily technical and there is hunger for "proper" developer's VC and source code management tool then go for it.

Even internal developer teams in Alteryx will utilize Git for workflows we build, like Connect loaders to keep all history and utilize things like Jenkins for automated builds and integration by DevOps teams.

Should you just replace the Server version control with this? I would not think that would be a recommended best practice as you need business users / end users supported too. And Server has your back there. 

Just keep in mind that every small change of your workflow, like moving a tool to a new location will change the XML source code so things like version diff become a little more tricky with GIT only. Best practice there would be to rely on macros and fragment large workflows into smaller bits and pieces so your XML code does not change that massively between edits.

The same goes for merging code which can be a little more tricky due to the way we manage metadata and dependencies in workflows.

 

Alteryx & Git for Python/ R tools

You can actually go beyond managing just workflow XML files in your GIT for Alteryx. What if we need to do this the other way around?

This means utilising your existing code repository in GIT with your R scripts or Python code to push this to Alteryx workflow code tools.

Sure - I actually wrote a short article about that a few weeks back. Python Code Tool Script Runner Macro (Code Injection).

With this bit, you can easily keep all your Python code in GIT repo and "inject" this code into the Python code tool of your worfklow.

I  have had the same thing planned for the R tool and should be pretty straightforward too. Anyone wanting to beat me to it? @ShaanM?

 

image.png

 

Alteryx Designer and Code Takeaways

  • Designer uses concepts of visual programming
  • Workflows can be visually compared to one another
  • Workflows are versioned on Alteryx Server using proprietary versioning (and this is the best practice that supports all types of users)
  • Alteryx does not rely on GIT big times and don’t use it natively, but can utilise parts of it if you want
  • Alteryx workflows are XML code -> you can use GIT to manage that XML code
  • You can use GIT with your existing Python or R scripts. Alteryx Python Tool allows you to dynamically inject code from a flat file location (from local GIT repo) with simple script - see details above

Promote & Code & Versioning

Alteryx Promote definitely has your back when it comes to managing & deploying your predictive models. This will, in reality, mean managing quite a bit of R or Python code of course. Most of the time anyway. Note: Alteryx predictive models can be deployed too of course!

Let me talk shortly about versioning and GIT integration here regarding Promote:

  • Deployed Promote models are versioned and accessible on Promote web interface (can redeploy and see the history for auditing and regulatory purposes)
  • Source code of models is version as part of the docker image but not easily retrievable (not through the web-based UI anyway) - developers/ coders should keep using their standard VC/ Git as best practice (include Promote model version in commit messages etc)
  • Python models must have certain structure
  • Most Data scientists/ developers will most likely avoid Designer and publish from IDE of their choice directly
  • Sample code from Python/ R is accessible on GitHub through
  • Promote fetches packages from CRAN (R) or PYPI (Python’s PIP) during model build
  • Promote can be set up to retrieve code from your own GitHub repo during model build but also to use the Ubuntu command with sh script to install the env

Promote & “Code Fetch” with CRAN/ PyPI

  • Docker (Swarm) creates a container from the R or Python base image (its a little more complicated but lets stick to this for now)
  • For R models, the required packages specified in promote.library() function of your model from the CRAN repository and installs it to the container. note: CRAN is the main repo of R packages (15k), this is where your code goes when you do install.packages()
  • For Python models you provide the libraries you want to install in a requirements.txt file alongside of your model's source code. These need to be accessible through PyPI note: PyPI (Python Package Index) is the main repo of Python packages (200k), installed mainly using pip install
  • Promote can also integrate code snippets from Github
  • A Github link with a token to the Github account needs to be presented in Python packages.
  • Last way is to use promote.sh script and provide the libraries you want to install as a Ubuntu command within the shell script for either R or Python

image.png

image.png

 

Hope you will find this article useful! Let me know if you have any questions or want to ask/share some best practices I forgot about.

 

Regs,

dm

David Matyas
Sales Engineer
Alteryx
7 REPLIES 7
jamielaird
14 - Magnetar

Epic post @DavidM ! 

 

This topic comes up all the time and it's great to have a comprehensive write-up that I can direct people to in future. Thanks!

_Lisa_
5 - Atom

Thank you for this very informative article @DavidM !

 

I'd like you get your advice on one thing - So the data scientist in my team has his python scripts in git. I was wondering is it possible to call those remote scripts from Alteryx designer using the a url link or python tool etc? 

 

Thanks in advance!

Lisa 

DavidM
Alteryx
Alteryx

Hi @_Lisa_ ,

 

With GIT i believe that the common practice is to GIT PULL first to the local machine and work with a 1:1 copy of your code repo on whichever machine you need that Python run from.

 

I can imagine though that if you prefer to keep your code in the cloud git repo only, there would typically be an API available for that GIT service that can be used to inject code into the Python Code tool.

 

I for instance had similar discussion with a customer around BITBUCKET and it supports exactly that

https://developer.atlassian.com/server/bitbucket/how-tos/command-line-rest/

 

Python injection may not happen just with files (ie. after GIT PULL to your machine) but I would think you could replicate that approach mentioned in article below to actually load code into the Git Code tool with REST api calls to your GIT repo, if that makes sense.

https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Python-Code-Tool-Script-Runner-Macro-C...

 

Hope this helps.

 

dm

David Matyas
Sales Engineer
Alteryx
mouna_belaid
8 - Asteroid

Insightful artcile! Thank you so much for sharing. This is really helpful!

franc1s
8 - Asteroid

Excellent write up! It is a pìtty that git and alteryx arent tied together more. I understand the visual programming "constraint" though.

 

Still, on the desktop, it would be great to be able to roll back to a previous commit, or have development branches to try something new, without messing up your workflows. That kind of tie-in would be marvelous.

TimN
13 - Pulsar

Thanks, David.  One question though.  My company stores Macros on a drive mounted to the servers.  We thought that was an acceptable practice.  Since these macros are not stored on or in the Alteryx server then don't you need to have some form of version control in addition to what Alteryx provides?  Is it better to load all macros to the server for that reason?

ThomasT
8 - Asteroid

Thanks for the article @DavidM 

 

We are in the process in setting up a data platform with Alteryx and GIT and have been struggling to find a good solution. The main issue we face is that we want to use a Dev and Prod environment and everytime we update a workflow in Dev and re-submit to GIT we can't just copy that workflow into the Prod GIT folder and overwrite the existing verison on the Gallery (ie Save as..and overwrite). It seems that you can only overwrite anything on the Gallery if you open the respective workflow from the Gallery which is a real pain if you want to have a Dev and Prod environment. Our solution is to delete/re-upload a workflow once it changed which unfortunately means we can't use the Gallery version control. 

 

If you have any suggestions to do that better, would be greatly appreciated! 

Labels