Engine Works

MacRo · ‎01-25-2016

There are a lot of tools in the Alteryx tool palette, but most of us only ever really use a handful of the ~150 tools that come with a full install of the software. The fact is, is that once you get a solid handle on just a few tools, it doesn't take long to see just how much you can do with that subset of tools -- and at that point, is it even worth the effort to explore some of the more obscure tools? Well, yes! Yes it is! There are tools buried in that wild jungle of functionality that will change the way you build workflows forever, and once you pick them up, you'll never want to go back.

The Developer toolset can be particularly daunting (Base64 Encoder? Blob what?! I'm not a developer and these tools clearly don't apply to me! Why bother?) -- and in a way, you'd be right -- not all of those tools need to be used by everyone. But there are some absolute gems in there that almost everyone can benefit from. In this post, I'm going to walk you through a simple example to illustrate the awesomeness of the following tools:

Block Until Done (in the Developer toolset)
Run Command (in the Developer toolset)
Dynamic Input (in the Developer toolset)
and, as a bonus... Directory (in the In/Out toolset)

This example came out of helping an Alteryx user with the following problem.

Each day, a data source I need is updated in a staging folder. I need to copy the data from staging over to a local directory (which can take up to 10 minutes), and then run my Alteryx workflow.

I'm going to break the solution up into four parts, walking through a new piece of functionality in each section. The workflow attached to this post contains all four example workflows that you can run on your own machine to see what's going on, but if you want to jump right to the final solution, the Part 4 workflow contains all of the tools mentioned in this post.

Part 1: Using the Run Command tool to copy files with Alteryx

Alright, let's jump into this and see how we can use the Run Command tool to automate the copying of files, so that you never have to do this by hand again. The Run Command tool allows you to execute commands through the Windows Command Line (you can play around with this yourself by typing "cmd" into the Start Menu search bar). If you're not familiar with the Windows Command Line, that's perfectly fine -- it doesn't take much to make things happen, and you can usually find solutions by googling "how to copy files with windows command line" or something similar.

undefined

In this case, we're going to use the "copy" command like so:

copy /Y [source] [destination]

(The "/Y" is an optional parameter that says that if the destination file already exists before the copy takes place, then it's ok to overwrite that file.)

So what we're going to do in this workflow is define where the source files are located and where we want them to be copied to, we'll use the Formula tool to generate a command that will copy the files to their destinations, and then finally we'll execute the commands that we've generated. Here's what this simple workflow looks like:

undefined

Pretty straightforward right? Take a look at the configuration of relevant tools below (or open up the workflow yourself) to see how its setup. (I split it up like this so that you could ease into things, and didn't want to show a big detail heavy image to introduce the workflow. But going forward, I'll cut to the chase.)

undefined

A couple notes about how the Run Command tool is configured (this is useful information if you haven't used the tool before, but greyed out to help the skimmers out there absorb the general concepts quickly!😞

Write Source: An output file is specified so that the commands we've created with the Formula tool are written to out as a "batch" file. This is simply a plain text file containing commands to be executed by the command line. Batch files aren't a recognized data type by Alteryx, but you can create a plain text file by selecting the output data type of "csv" and setting the delimeter to "\0" (which represents no delimeter). You can see how its configured in the workflow, but the key point here is that it's just a regular text file that you could create in notepad.
Command: Really, you can have the Run Command fire up any executable (/application) that you want. So despite it's name, you actually have to tell the tool that you want to run a command on the command line. For simplicity, you can also just enter "cmd" here, however if you package up your workflow into a .yxzp file, then when someone opens it up, they will see an import error saying "cmd" is not a file that exists in the package. This is totally benign, but can be avoid altogether by specifying the full path to cmd.exe.
Command Arguments: The "/c" is necessary here to tell the command line to run what comes afterward. This may be a bit confusing, but as a rule of thumb, when calling the command line, put "/c" before the command. For example, you could put "/c mkdir c:\new_folder" (the mkdir command simply creates a new empty folder).
Run options: The "Run Minimized" and "Run Silent" options are set so that all this copying business happens quietly in the background without popping up a distracting Command Prompt window on the screen. (Any issues will still be reported to the logs though.)

Now go ahead and run the workflow to become inducted into the Hall of Developer Tool Users! It's a small and prestigious group, but everyone here is friendly and is always happy to help solve problems and share the Alteryx tricks they've discovered.

Part 2: Using the Directory tool to identify recently modified files

This is great, but I only want to copy files in the Staging directory that have been modified in the past day.

Now that you're an expert with the Run Command tool, you might be thinking, "I can find recently modified files with the dir command, which I can then parse out with Alteryx." And sure, that might be a fun little game to play, but you can do this very easily with the Directory tool (not a Developer tool, but still one of the lesser-revered tools in the pallette, and can be found in the In/Out category).

undefined

The Directory tool allows us to specify a directory and the file type we're interested in, and will return a list of all files meeting those specifications. It also returns a ton of fields with details about each file, such as the creation date, modification date, folder, filename, whether or not it is a system file, etc.

Take a look at the config below. In addition to the Directory config, take a look at how easily you can check how long ago the file was modified with the Formula tool!

undefined

**Part 3: Using the Block Until Done tool to do things only after other things have finished running**

Sweet! But once the files are done copying over, I want to kick off other processes. Can I do that?

Yes you can! The Block Until Done tool is one of the most useful ones in the toolset since it gives you so much more control over the flow of how your data is processed.

undefined

The way it works is pretty simple:

First the data stream coming out of output #1 will be run
Then once #1 is completed, the data stream coming out of output #2 will be run
Finally, the data stream coming out of output #3 will be run

Check it out:

undefined

Part 3.5: Let's go on a tangent and take a quick look at runtime events

If you're wondering what that flashy business that happens at the start of the Part 3 workflow, here's the answer: It's a runtime event. These can be accessed by going to the Workflow configuration (click anywhere on the canvas and in the Configuration panel, click on the "Events" tab). I'll leave the details for another post, but essentially this lets you run commands in the same way you do with the Run Command tool, except that they will be triggered either immediately before or after the workflow is run.

In this case, I added this event because the example depends on the destination files not existing when you run it. So after you run the example once, if it is run a second time, then you would not be able to see the how the Block Until Done tool controls the flow of processing. So we can use a runtime event to delete the destination files at the start of each run, making it so that you can run the workflow over and over and over again... and see the files being copied over each time! (Aw yea!)

Here's what that configuration looks like:

undefined

Part 4: Using the Dynamic Input tool to read in the data after it has been copied

Ok that's cool, but I want to actually do stuff with the data once it's been copied over

The Input tool is great and all, but don't you ever wish it were more... dynamic??? I did too, before I discovered the Dynamic Input tool. It's pretty great, and allows you to read in files based on filenames contained within a data stream. This is the final key in building out this workflow. Let me show you how it works.

undefined

A couple notes about the Dynamic Input tool:

You need to give it "template". This is simply letting it know what format the input files will be in, so if you have multiple input file types, it probably makes sense to use more than one Dynamic Input.
The template is just like the Input tool's configuration. A useful tip is to set the "Output File Name as Field" option in the template so that you can see which file each record came from.

undefined

The end

Well that's all I've got. I hope this is helpful for those who haven't dived deep into Alteryx yet, and maybe even inspires you to explore even more of the tools in the Alteryx pallette. If you haven't already checked it out, Data Prep subforum on the Alteryx Community is a great place to share tips and learn more ways to use the tools included with Alteryx.

Engine Works

I Am The Developer Toolset (And So Can You!)

Part 1: Using the Run Command tool to copy files with Alteryx

Part 2: Using the Directory tool to identify recently modified files

Part 3: Using the Block Until Done tool to do things only after other things have finished running

Part 3.5: Let's go on a tangent and take a quick look at runtime events

Part 4: Using the Dynamic Input tool to read in the data after it has been copied

The end

**Part 3: Using the Block Until Done tool to do things only after other things have finished running**