This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
It's not uncommon to have a situation where you need to conditionally join one dataset to another. Perhaps the most common is when you want to join one file to another where a date from the first file is between, greater than or less than a date(s) on a second file. The default tools found in the Join section of the tool palette don't provide a simple way of doing this (it can be done, but you need to string several tools together to make it work. There is a better way! Read on...).
There is great macro available in the public Alteryx gallery called Advanced Join (find it here, but spoiler alert... you can download the attached workflow which includes this macro so you don't have to go to the gallery to get it). The Advanced Join gives you greater latitude than the Join tool. Most notably, you can select records from file A that are unique to A AND intersect with file B. Now you may be thinking, “I can do that by unioning the records from an inner join with records from a left join,” and you would be correct. But it takes two tools to do what one Advance Join does. More importantly, the Advanced Join allows you to put a conditional statement on your join which is something you can't do with the Join tool. And it’s this feature - the ability to use conditional statements in a join - which we will focus on for our purpose here.
Let's get into some examples. I have a file, 'Fruit List’, which contains data about various fruits. This file contains a Column Id, a Fruit Name, a Start DateTime and an End DateTime:
I have a second file, 'Greek Alphabet’, which contains a Column Id, a Greek letter and a Datetime.
I want to join the two files on ColumnId where the Datetime from Greek Alphabet (file B) is BETWEEN Start Datetime and End Datetime from Fruit List (file A). Here's the workflow and a screenshot of how to configure the Advanced Join:
And here are what my results look like:
Only one record from Greek Alphabet matched one from Fruit List on ColumnId where Greek Alphabet's Datetime was between Fruit List's Start Datetime and End Datetime.
In the next example, I have the same Fruit List file and want to join it another file, Greek Alphabet that contains just one datetime filed:
The first thing to note is both files have a field called 'DateTime.' We'll want to give these unique names to avoid ambiguity when we write our conditional state in the Advance Join configuration.
I want to join both files on ColumnId but only when DateTime from Fruit List is LESS THAN DateTime from Greek Alphabet:
And the results...:
Let's look at one last example. This time, I'm going to use the Fruit List and Greek Alphabet files used in the first example (Fruit List has a Start DateTime and an End DateTime). I'm interested in matching records where DateTime from Greek Alphabet is BETWEEN Start Datetime and End DateTime from Fruit List. I'm not matching on ColumnId this time.
For the Advanced Join configuration, I'm going to cross join my files. (CAUTION: the resulting join could contain as many rows as the product of the number of rows of the incoming datasets - a.k.a. Cartesian join - depending on how restrictive your conditional is. This means if you're joining 2 datasets that contain a million records each, the resulting dataset could contain as many as one trillion records! ). If I had wanted to match on ColumnId, I would have had to do that separately using a Join tool. The cross join option only allows you to apply a conditional statement:
Results from our 3rd example:
Notice how 10 records from Greek Alphabet were joined to just one record from Fruit List.
The Advanced Join tool can save you time and a lot of headaches when you want to join files using a conditional statement. It has some limitations - you can only join two datasets and include one conditional statement per tool, cross join limitation mentioned above - but Advanced Join provides greater capability and flexibility than the standard Join tool.
This article covers how to delete records from a database using an Input Text tool containing a parameter and a macro using the Post SQL Statement. The macro below is an example of how you can do this.
The partnership between Alteryx and Tableau is becoming stronger and stronger, and the seamless effortless integration has been made easier through the Publish to Tableau Server Tool. This article demonstrates the use of the Publish to Tableau Server tool, available on the Alteryx Analytics Gallery.
As currently designed, the Amazon S3 Download tool only allows one file, or object, to be read in at a time. This article explains how to create a workflow and batch macro that will read in the list of objects in a bucket and allow you to filter for the file(s) you want using wildcards!
Fact: workflows are the best. Look it up. They’re all about getting things done and, with hundreds of tools and the ability to integrate external processes, there’s no shortage of things you can get done. We know that there are some areas of analytics that require a little extra firepower, however, and that’s why you can leverage your workflows in apps and macros for added functionality.
There’s a lot going on in the world of analytics. Endless data stores and insight are at the other end of an internet connection and, as analysts, we’re always in on the action. Being in the thick of the fray with data whizzing by at lightning speeds, being equipped with the right tools is a must. Like you, Alteryx also likes to live dangerously, and we’re always ready for action.
Duplicated macros can often occur when you import macros within workflows and when trying to install different versions of the same macro. Fortunately, there are a few steps you can take to clean up your tool palette:
This question is highly relevant when looking to circumvent Alteryx error's when you have hardcoded a certain workflow process to expect certain field names. So when you are trying to process multiple files with different field schema's you don't want Alteryx to stop processing all the files if one file fails. You ideally want Alteryx to skip the file and carry on!
Attached is a sample workflow that can help you answer the above question:
The workflow will start with a Directory tool browsing to the folder location that has all your input files
As this example uses xlsx files, we use a Formula tool to append the sheet name to the full path
We then feed this data into a Batch Macro
The batch macro allows us to process one file at a time but each file goes through the same process
The input file within the macro is replaced by the incoming file path
We then use a RecordID tool to allow us to keep the data integrity when we transform the data for the filtering process downstream
In the Transpose tool, we use the RecordID tool as a key field to pivot on. We then transpose the rest of the data fields to create a NAME (This will have your field headers) and VALUE (This will have your data) fields. This part will be dynamic if new fields get added because we have the 'Dynamic or unknown fields' checked and they will fall within the NAME field which will reference in the filter tool
Within the Filter tool, you can now add in the field variables you need for your workflow
On the true side of the filter, you will now have the fields and values which met your criteria and on the False, you will have the fields which did not
The ultimate goal though is to bring back the whole dataset if that file had fields which met your criteria
To do this we use the Formula Tool to add in the file path of the file which we can use outside of the macro to bring together the whole dataset
The final tool is the Crosstab Tool to orientate the data back into its original format using the RecordID Tool as the group by field.
Save the macro (File>>Save As)
Insert the macro into the workflow and connect to the Formula Tool. In the configuration of the macro choose 'Full path" from the dropdown. This will update the input tool and the Formula Tools
The two outputs on the macro refer to the true and false side of the filter. You can now use a Join tool and connect the true and false to the left an right inputs of the Join Tool
The field you will join on will be the full path and RecordID
Now if the file met your condition in the filter it should have values on the left and right of the macro outputs. Therefore, in the 'J' node of the Join tool, you should see the data from your successful file
In the 'R' you should see all the data from the files which did not meet your condition as they don't have anything to join to on the left 'true' side of the filter
You can then paste your desired workflow to the 'J' output of the join tool and continue your data process
This will now only allow files with the desired field headers to pass through and you have circumvented your workflow from breaking if the incorrect field schema from certain files is passed through.
S/O to Shaan Mistry who brainstormed this workflow with me.