This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Sometimes you look at the steaming pile of data before you and wonder how you’ll ever get it in the form you need. Every option seems to require a great deal of manual labor, and as a lazy– er that is , as a data blending professional , that is simply something you will not abide.
Trying to convert all of your old, mundane Excel workbooks into Alteryx workflows? The Running Total Tool could be the key to your success! You know, it’s that tool in the Transform category with the little running man picture on it.
The Transpose tool pivots data for all selected fields. The column headers are listed in the name field and the corresponding data items are listed in the value field. You can also select key fields which will remain unchanged through the transformation. The transpose tool is often used in conjunction with the Cross Tab tool, which essentially works in the opposite direction.
Is there a way to avoid the Cross Tab Tool from transferring all the input information alphabetically? Simply add a RecordID to your records and add the RecordID field as a grouping field in your Cross Tab Tool to keep the order!
We frequently get questions about how to flag rows in a data set that are missing values in any column.
In the example below, we will demonstrate how to add a new column to the data that gives a count of null or empty values per row. The end result will give a count of 1 to each of the highlighted rows in the image. The new column can be used in a filter tool to isolate rows of data that have missing values.
The Excel number for a modern date is most easily calculated as the number of days since 12/30/1899 on the Gregorian calendar.
Attached please find a v10.0 workflow that converts a double (Excel date) to a Datetime in Alteryx.
It uses this function in a Formula Tool:
where Datebase = a datetime value set to "1899-12-30 00:00:00"
and Field1 is double value = 42397
You monitor the mileage of multiple trucks as they deliver shipments over the course of a week and record additional information regarding each truck in a file (Truck Metrics). Each truck’s cumulative mileage per day is recorded in a separate file (Truck Mileage). Your goal is to update the mileage-related fields in Truck Metrics with the values recorded in Truck Mileage. Note: today is Tuesday so only fields for Monday and Tuesday will be updated in in the Truck Metrics file.
Manually Select Fields to Update
Whether the data is uniquely identified by a Truck Number (or Record ID) or identically ordered in both files, data from Truck Metrics (the table to be updated) and Truck Mileage (the data used for updating) can be Joined together. Then, using the Select functionality within the Join tool, I can manually choose the fields from Truck Mileage that I want to use to replace the fields that need to be updated (Figure 1).
Figure 1: Fields from Truck Mileage (Yellow) replace fields from Truck Metrics (Blue). Note that fields in yellow are selected while fields in blue have been deselected. Fields that need to be included from Truck Metrics (Red) remain selected.
Fantastic! A simple, straightforward way to update fields! But, as any analyst knows, working with data is rarely simple or straightforward. What if you’re dealing with 20 fields that need to be updated, not just 2? In that case, manually selecting fields to update is not only tedious but also error-prone. For these types of situations, I recommend a process that allows for a more dynamic approach.
'Verticalize' the Data to Dynamically Update Fields
Transposing, or ‘verticalizing’ data, allows for a more dynamic workflow when you have unknowns in your processing. In a scenario such as this one, you may have an unknown or changing number of fields that will need be updated in Truck Metrics. Using this approach, we’ll first Transpose both data sets to configure the Field Name and its associated value in a single row (Figure 2).
Figure 2: The data is transposed from Truck Mileage and Truck Metrics. The highlighted fields in Truck Metics indicates which fields will be updated, as the same field exists in Truck Mileage.
Then, we’ll Join our datasets based on two fields: Truck Number AND Name. This ensures that fields in Truck Mileage will match to the correct fields in Truck Metrics, assuming the fields names in both tables are named in the same way. The only selecting we’ll have to do is to make sure all fields from the Right data source (in this case, Truck Metrics) are deselected (Figure 3). This allows that, in the situation of matched Truck Numbers and field names, the updated values will be used.
Figure 3: The Joined fields indicate the fields that exist in both Truck Mileage and Truck Metrics. Fields in yellow (Truck Mileage) are selected to reflect updated values in downstream tools. Fields in blue (Truck Metrics) are deselected.
Note that any unmatched fields have fallen out of the Right side of the Join:
To add them back into the data stream, simply Union the Center and Right Joins together, setting the tool to “Auto Configure by Name”. Then, to rearrange the data into its original format, use the Cross Tab tool (Figure 4). And, voila! Your data is updated!
Figure 4: The updated fields are highlighted in yellow. All other fields originally included in Truck Metrics are included the in the final results as well.
Recently a couple questions came across the Customer Support desk asking how a fiscal calendar could be incorporated into a workflow. Natively Alteryx doesn’t have a tool to create one, but Alteryx does have a number of tools to make a fiscal calendar. Here is an example of how this can be done.
1. Determine the start and end dates of the fiscal calendar and enter them into Text input tool, each on an individual row
2. Connect a TS Filler tool to generate dates between the start and end dates
3. A Select Tool was added to remove unnecessary fields
4. Add a Multi-Row Formula Tool to create a Day of Week field by assigning each row a day of the week from 1 to 7
5. Add another Multi-Row Formula Tool to calculate the fiscal week. Whenever the Day of Week is 1 add a value of 1 to the previous row’s fiscal week value. This will create a running week value for the entire year
An example workflow is attached. Also in example workflow is an example of how the fiscal month and week of month may be added. (Workflow is in Alteryx 10.6 version.)
Question I have a table of sales data with each column being a week's worth of sales. I only want records that have data in each of those fields and want to filter out all records that have Null values. How can I do this?
Answer There are two basic elements necessary to make this happen. The first is that all records in the original table have a unique ID. If you do not have a unique ID in your data, go ahead and add a Record ID Tool.
In the sample data you can see we will want data from Rows 1 and 6 while filtering out each of the other records because they contain null values.
From here we will use the Transpose Tool to pivot your data into 3 separate columns. In the transpose field choose your unique ID as the KEY FIELD and make sure all other records are selected as DATA FIELDS.
The result is that you will have your unique ID field, a field called [Name] which contains the names of each of the fields in your data, repeated for every unique ID in your original data, and a [Value] field which contains the individual values for each of the records for each of the columns in the original data.
Now we want to search for Nulls, and get a comprehensive list of the UniqueID values that do not contain Null values. Now is the time to bring in a Summarize tool and GroupBy your unique ID field, and then use the CountNull action.
The result is a list of how many nulls exist in each of your unique ID groups.
Next we can simply filter out the fields that have 0 null values in them and then use the unique IDs to join back to the original data, and pull only those records.
It's important to note here that because I'm only interested in the original fields I intentionally chose to deselect the unique ID and the Null Count fields from the output of the join so that I am left with only those records that have data in all of the weeks.
See the attached v10.5 workflow for an example of the approach above.
You have multiple fields in your data that correspond to customer addresses. Some customers may have more than one address listed in their record. However, you want to whittle that list to one address per customer. That one address is the first, non-null address found when moving sequentially across a set of fields that contain address data.
For our example, we have three fields of data related to addresses: Address1, Address2 and Address3. The preferred field of data to use is Address1. However, if Address1 does not contain data, then we’ll use the data in Address2. If both fields of Address1 and Address2 do not have data, then we’ll use the data in Address3. The final output should contain a single list of the addresses highlighted in yellow in Figure 1.
Figure 1: The final output will contain a single list of the data highlighted in yellow.
Method 1: Write a Conditional Statement
The most common approach to this type of problem is to craft a conditional statement to select data based on a set of criteria (in this case, order). In the case of our data, it would look something like this:
IF IsNull([Address1]) AND IsNull([Address2]) THEN [Address3]
ELSEIF IsNull([Address1]) THEN [Address2]
ELSE [Address1] ENDIF
However, what if I had 20 fields of addresses instead of 3? Just getting that statement with three fields took me too long to write out! If you do have 20 fields, you might want to start typing that Formula now….
IF IsNull([Address1] AND IsNull([Address2]) AND IsNull([Address3]) AND IsNull([Address4]) AND IsNull([Address5]) AND IsNull([Address6]) AND IsNull([Address7]) AND IsNull([Address8])...
You get the idea. And now you’re thinking, “You’re going to tell me there’s a better way, right?!?” Well, yes...I am!
Method 2: Data Manipulation
An alternative method of solving this problem is to manipulate the data using the Transpose, Filter and Sample tools. I’ll share some advice from @RodL here: “If you want to make something…truly dynamic, then the ‘best practice’ when you are dealing with an unknown number of columns is to ‘verticalize’ the data”. In our case, we may know the total number of columns of address data we have; what we don’t know is which column the data we want is actually in.
Following @RodL’s suggestion, we’ll ‘verticalize’ the addresses using the Transpose tool. This stacks the addresses for each customer in order of the fields in the table (Figure 2). We’ll use the Client ID (or Record ID, if you’ve chosen to add one) as our Key Field and the fields that contain address data as our Data Fields.
Figure 2: All address fields per Record ID (or Client ID) are stacked vertically in order of the field sequence.
Since Null values are not usable records for our purposes, we’ll use a Filter to identify all the usable data (non-Null values). Now that our usable data is stacked vertically in order of field selection, we can Sample the first record from each Record ID (or Client ID) group. We’ll configure the Sample tool to identify the First N (where N = 1) Records from a group (Group by Record ID or Client ID).
Figure 3: Sample the first record from every Record ID or Client ID group.
After some data clean-up with a Select tool, we're left with a column of the selected address for each of our customers:
This article is part of the Tool Mastery Series, a compilation of Knowledge Base contributions to introduce diverse working examples for Designer Tools. Here we’ll delve into uses of the Record ID Tool on our way to mastering the Alteryx Designer:
If you're using the Record ID Tool in v11.0+, be sure to familiarize yourself with the tool's interface redesign!
Here at Alteryx, we do our best to keep things simple. Analytics is hard enough, there’s no need to overcomplicate things. That’s why we have the Record ID Tool – true to its name, if your records don’t have a unique identifier or row count just drag this tool onto the canvas. That’s it. Put your feet up, take a deep breath, and celebrate the win. The best part? The Record ID Tool doesn’t stop there – there’s countless applications of the tool that can simply other operations, too. It’s a gift that just keeps on giving:
Use a Record ID field to create primary keys in database tables created by a workflow
Split your output into multiple files using Record IDs to specify precise record counts
Process workflows in “packets” of records leveraging a Record ID - in some cases, this decreases run time
Compare datasets down to the last record by mapping them to a Record ID
Use the modulo (mod) function to make groups of your data from the Record ID field, simplifying otherwise complex reshapes (see examples 1 and 2)
You can also enforce a record order to your datasets using a Record ID (just sort by it), which often comes in handy before reshaping or macro processing. If you’re looking to assign “Group By” Record IDs that reset to unique values of a particular field, try using the Tile Tool.
That’s a lot of operations made simpler by a single tool; it could be a record. Now, if that’s not worth celebrating, we don’t know what is.
By now, you should have expert-level proficiency with the Record ID Tool! If you can think of a use case we left out, feel free to use the comments section below! Consider yourself a Tool Master already? Let us know at email@example.com if you’d like your creative tool uses to be featured in the Tool Mastery Series.
Stay tuned with our latest posts every Tool Tuesday by following Alteryx on Twitter! If you want to master all the Designer tools, consider subscribing for email notifications.
The Dynamic Rename Tool is part of the developer category of tools. It allows the user to quickly rename any or all fields within an input stream by employing the use of different methods.
The user has the option to rename only certain fields, all fields, or even dynamic/unknown fields at runtime (e.g. after a Cross Tab Tool). The option for renaming fields are:
If you're using the Formula Tool in v11.0+, be sure to familiarize yourself with the tool's interface redesign!
In 1925, the small town of Nome in Alaska suffered a diphtheria outbreak. Out of usable antitoxin and in the depths of winter, the town doctor telegrammed the governor in Juneau and authorities in Washington D.C. requesting a million units of antitoxin to stave off an epidemic. Confronted with the task of saving the people of Nome, Alaskan authorities debated their options of getting the antitoxin to Nome. With sub-zero temperatures, limited daylight and a lack of experienced pilots, airlifting the supplies was a less than ideal solution. Instead, a dogsled relay was selected to transport the antitoxin across the 674 miles from Nenana to Nome. The mushers endured near-blizzard conditions and hurricane-force winds to deliver the serum five and a half days later, saving the small town from a deadly outbreak.
While you may not have the fate of a small town on your hands, understanding the time elements in your data is critical for your analyses. Perhaps you need to determine a product’s time in transit between two ports or warehouses, identify non-business days in your dataset to correctly account for time, or re-format data to work on a 12 or 24 hour clock. In all of these cases, time is of the essence!
To demonstrate the use of the Formula Tool’s DateTime functions, and in the spirit of the Alaskan dogsledding tradition, we’ll work with data from the last leg of the Iditarod, which stretches from Safety to Nome. Using the data provided from the 2016 race, we will 1) calculate the elapsed time for each musher between the two checkpoints, 2) determine the Iditarod’s start date and time from a musher’s total race time and arrival time in Nome, and 3) re-format the time format from a 24 to 12 hour clock. Before beginning with Formula Tool’s DateTime functions, make sure that your data should be appropriately formatted as a DateTime field type using the DateTime tool or a Select Tool.*
*The only acceptable string format for use with DateTime functions is yyyy-mm-dd HH:MM:SS, which is Alteryx’s native string format.
The Input data contains the time each musher left Safety (Departure) and the time that they arrived in Nome (Arrival), finishing the race. To calculate the time it took for each musher to complete this final leg of the race, we’ll find the difference between the start (Departure) and end (Arrival) times using the DateTimeDiff function:
DateTimeDiff (<date/time1>, <date/time2>, <units>)
[Arrival] [Departure] ‘seconds’
The expression returns the elapsed time between the Arrival and Departure times as years, months, days, hours, minutes or seconds as a numeric field type. In this case, the expression returns the number of seconds (Int64) between the Departure and Arrival times, which can be used downstream to calculate hours (Seconds/3600) and minutes (Seconds/60).
The sample dataset not only provides data for the final leg of the 2016 Iditarod but also each musher’s time for the entirety of the race (Total Race Time). Using these pieces of information, we can determine the time at which the mushers started the race in Anchorage by subtracting the musher’s total race time from their completion time (Arrival) in Nome. After a bit of parsing and simple math, we can convert a Total Race Time of 8d 11h 20m 16s to its value of 732016 seconds (Total Seconds). Then, we can subtract the race time (in seconds) from each musher’s arrival time in Nome to determine the start time of the race using the DateTimeAdd function:
DateTimeAdd (<date/time>, <interval>, <units>)
[Arrival], -[Total Seconds]*, ‘seconds’
*Because we are looking to determine a DateTime in the past, we use a negative value for the time interval. When looking for a DateTime in the future, ensure that your time interval is a positive numeric value.
After applying our DateTimeAdd function, we find that the 2016 Iditarod began at 2016-03-06 15:00:00. That’s not the easiest format to understand, so let’s take advantage of the power of the DateTime Format function. The DateTime Format tool allows you to indicate a format specifier to parse components of a DateTime field. A format specifier begins with a percent sign (%), followed by a case-sensitive letter. For a complete list of supported format specifiers, please refer to the Formula Tool’s help documentation.
For our particular example, let’s identify the day of the week, the month, date and time (on a 12 hour clock) that the race started. We use the format identifiers to create a string field that contains this information. It is important to note that each format specifier is treated as text and needs to be wrapped in single quotes (see example below).
Using a series of format specifiers, we can see that the 2016 Iditarod began on Sunday, March 6, 2016 at 3:00 PM. When time, whether when dogsledding or managing your business operations, is of the essence, you can count on the DateTime functions to save the day!
Did you know the average football game lasts 3 hours and 12 minutes and only amounts to roughly 11 minutes of play? Now, I love trying to eat Doritos through my TV screen as much as the next guy, but for me the highlights are definitely a better watch. The Summarize Tool would probably agree - the most effective communication of your data is the most concise summary of it. Whether it’s concatenating strings for storage, merging reports to have better readability, getting your spatial objects to interact, or even calculating averages and other formulas on groupings of data, the Summarize Tool can reframe your data to be more informative. This article provides a few examples on how.
Have you ever wanted to do a Cross Tab, but needed the results in a particular order? You've probably discovered that Cross Tab will dutifully change your data to column headings and keep the rest of the data correlated, but that the output columns will be arranged in alphabetical order. It's as if there were a Select tool with the "Sort on Field Name" Option thrown into your workflow without you putting it there.
The Report Map Tool (master it here) allows to create thematic maps by selecting a "Theme" field on the Data tab and then further defining the theme on the Layers tab, for example:
The above example creates a map of Texas showing average annual rainfall totals where orange is the least rainfall and blue the most:
Pretty nice, right? But what if you want to change the map and instead of applying the theme to the fill color for the polygons, you want to apply the theme to the outline and just show that?
That is a little trickier because the Report Map Tool allows you to adjust the outline color and size of your polygons, it doesn't automatically apply the theme, so a workaround needs to be built.
You could feed in each polygon as an individual layer but that is difficult to manage - to keep the color gradient consistent, making sure they are in the right order. And what if a new rainfall range is introduced? You might have to adjust a couple of layers to account for it.
A better option would be to turn the polygon outlines into line polygons themselves. That would allow you to apply a theme right to the outline polygons.
In order to do this, we will use the following tools:
A RecordID is assigned so that we can pull the data apart and put it back together again.
The polygons are split into detailed regions using the Poly-Split tool and rows flagged as 'holes' are removed.
The polygons are split into points.
Those points are reassembled as a sequence polyline. The create the polyline, the data is grouped by the RecordID to keep each polyline separate. (A polyline contains multiple line segments, where a line has one start and one end point, but can have any number of points in between. A polyline can be quite complex as in the case of road systems, or rivers.)
The sequence polylines are joined back to the original data set.
Using the reporting tools to create the maps with rainfall range as the thematic field.
With that workaround you can create a map that looks like this:
For details on tool configurations, see the attached v10.5 workflow.
How do you cut off a certain amount of characters from the right hand of a field?
Starting Values: Blue4509, Blue2034, Yellow2345, Orange2341
Ending Values: Blue, Yellow, Orange
There are many ways to accomplish this - below and attached (v10.5) find just 4 of the options.
Option 1: You could use Left(Data, (Length(Data) - 4)) This method works well if your numbers are all 4 digits.
Option 2A: Regular expression either in the RegEx Tool or using Regex_Replace in the Formula Tool. This option allows for a varied number of digits.
In the Formula Tool, your expression could be REGEX_Replace([Data],"(\l+)\d*","$1")
The \l is looking for lowercase letters
The + specifies that the lowercase letter(s) must appear at least once
The \d is looking for digits
The * specifies that the digits may or may not appear (or is there "zero or more times").
The brackets around the \l+ "mark" that part of the pattern as special
So if the input is "Blue450"
REGEX_REPLACE([Input],"(\l+)\d*","$1") will return "Blue".
Option 2B: If you changed the expression to "mark" the digits portion, you could separate the two marked portions
REGEX_REPLACE([Input],"(\l+)(\d*)","$1 light has a wavelength of at least $2 nanometers") which will return the phrase:
"Blue light has a wavelength of at least 450 nanometers"
It is important to note that this function is case-insensitive by default so a \u (uppercase) would give the same result. If you want the expression to be case sensitive, add a ",0" to the end of the expression.
Grouping = ()
Also, if you use the RegEx Tool (from the Parse toolbox) you can access a list of commonly used RegEx 'tags'...and don't forget to access the Alteryx help with F1. For fun: http://xkcd.com/208/
Option 3: Another RegEx option is to separate the letters from the digits. ([[:alpha:]]+)(\d.+)
[:alpha:] = any letter
\d = digit
+ = one or more times
() = marked group
This article was put together to resolve a common issue with cleansing your data as well as to show the use of tools and techniques that are not normally used for newer users. The goal of the article is to get newer users into these tools to open their creativity with the tool and hopefully take you to the next level!
In this use case, the data in the attached workflow is messy with capitalized strings all over the place. We want to format the data by removing some of the capitalization, but not all of it.
Note: If we wanted to make every first letter of the word capitalized we can use the Formula Tool and the TitleCase(String) function. This would make BEAR the WEIGHT - Bear The Weight. See the difference?
The tools that we will be using in this exercise is the Record ID, Text to Columns, RegEx, Formula, Tile, and Cross Tab Tools.
The exercise will show you the importance of using the Record ID Tool. The flexibility of the Text to Columns and RegEx Tools, the under-used Tile Tool, the creativity of the Formula Tool, and the not so scary Cross Tab tool when then data is configured properly.
We hope that these exercise and use cases open up your mind and the greatness of Alteryx!
Enjoy! The attached workflow is in version 10.5.