The Sample Tool allows you selectively pass patterns, block excerpts, or samples of your records (or groups of records) in your dataset: the first N, last N, skipping the first N, 1 of every N, random 1 in N chance for each record to pass, and first N%. Using these options can come in the clutch pretty often in data preparation – that’s why you’ll find it in our Favorites Category, and for good reason. While a great tool to sample your data sets, you can also use it for:
Upon creating a BINGO game, I came across a technique that I thought could be useful in "real world" scenarios for users who are attempting to iterate a process and then replenishing the data after a certain amount of time.
The Multi-Row Formula Tool functions much like the normal Formula Tool but adds the ability to reference multiple rows of data within one expression . Say, for example, someone was on the ground floor of a house and had a Formula Tool. They would only be able to talk to the people also on the ground floor. If they had a Multi-Row Formula Tool, though, they would also be able to talk to the people upstairs, in the attic, and in the basement as well.
The Auto Field Tool : a tool so easy you don’t have to do anything – just put it on your canvas and viola. Automatically optimized data types. If you’re running into data type related issues and errors in your workflows, or just looking to add some speed or reduce the occupied disk space your data is hoarding – look no further than the Preparation Tool Category ’s Auto Field Tool, which reads through all the records of an input and sets the field type to the smallest possible size relative to the data contained within the column.
Users often ask, "How do I do (x) function in Alteryx?" - here's a handy guide for translating Tableau functions for use in Alteryx!
*Please note: This is not a comprehensive list of all functions available in Alteryx and Tableau - only functions that are written differently, but perform similar tasks, are included here. For a list of all the functions available in Alteryx, please refer to our Help Documentation.
Return smallest integer greater than or equal to [x]. Works like the 'RoundUp' function in Excel.
x % y
Modulo of n divided by d - The Modulo operation finds the remainder of division of one number by another number.
Return [x] raised to the [e] power.
Return [x] rounded to nearest multiple of [mult].
IF isnull([field]) THEN 0 else [field] ENDIF
Returns the expression if it is not null, otherwise returns zero. Use this function to use zero values instead of null values.
IF Contains([field], "string") then 1 ELSE 0 ENDIF
Returns true if the given string contains the specified substring.
FIND(string, substring, [start])
Searches for the occurrence of a particular string within a data field and returns the numeric position of its occurrence in the string. In Tableau, returns the index position of substring in string or 0 if the substring isn't found. If the optional argument start is added, the function ignores any instances of substring that appear before the index position [start].*
Return the length of the string [x].
Converts a string to lower case.
REGEX_Match(string, pattern, icase)
Searches a string for an occurrence of a regular expression.
REGEX_Replace(string, pattern, replace, icase)
REGEXP_REPLACE(string, pattern, replacement)
Allows replacement of text using regular expressions and returns the string resulting from the RegEx find pattern and replace string.
Substring(x, start, length)
MID(string, start, [length])
Return the substring of [x] starting at [start] and stopping after [length], if provided.*
Remove character in the string y from the end of the string x; y is optional and defaults to trimming white space. In Tableau, this function trims extra whitespace.
Remove character in the string y from the end of the string x; y is optional and defaults to trimming white space. In Tableau, this function trims extra whitespace.
Converts a string to upper case.
Date Time Functions
DateTimeAdd(datetime, interval, units)
DATEADD(date_part, interval, date)
Return the given date/time modified by the given duration. The <interval> specifies a positive or negative integer of time to add or subtract and <units> is one of a date/time unit - "years", "months", "days", "hours", "minutes", or "seconds". For Tableau, additional date_part units are allowed.
DateTimeDiff(datetime1, datetime2, units)
DATEDIFF(date_part, date1, date2, [start_of_week])
Subtract the second argument from the first and return it as an integer difference. The duration is returned as a number, not a string, in the specified units - "years", "months", "days", "hours", "minutes", or "seconds". For Tableau, additional date_part units are allowed.
DateTimeTrim(datetime, trim type)
DATETRUNC(date_part, date, [start_of_week])
Remove unwanted portions of a date/time and return the modified date/time. Options include: firstofmonth, lastofmonth, year, month, day, hour, minute. In Tableau, truncates the specified date to the accuracy specified by the date_part. This function returns a new date. For example, when you truncate a date that is in the middle of the month at the month level, this function returns the first day of the month. The start_of_week parameter is optional.
Returns the current system date and time.
DateTimeParse(datetime, format of incoming string)
MAKEDATE(year, month, day)
Converts a date string with a specific format to the standard ISO format yyyy-mm-dd HH:MM: SS. In Tableau, returns a date value constructed from the specified year, month, and date.
Returns today’s date. The time is set to midnight of the beginning of the day. Returns the current date.
ToNumber(x, bIgnoreErrors, keepNulls)
INT(expression) or FLOAT(expression)
Converts a string parameter to a number. The second parameter is optional and allows for ignoring conversion error messages. This parameter is a boolean flag and will accept a value of 1, 0, true or false. There is an optional 3rd parameter to handle Nulls. In Tableau, INT casts its argument as an integer. For expressions, this function truncates results to the closest integer toward zero. FLOAT casts its argument as a number with decimal/float precision.
Casts its argument as a string.
*In Alteryx, string positions start at 0. In Tableau, string positions start at 1.
You've been given data for a new project and it contains lots of extra (and unnecessary) rows before you even get to the information you need to work with. Look familiar?
For many Alteryx users, this situation is all too common. Luckily, there's a pretty easy way to resolve this issue using the Sample and Dynamic Rename tools!
To demonstrate this approach, we'll use some sample data that has extraneous information and space at the top (Rows 1-4) of the spreadsheet in Figure 1 (below). While the information itself might be important, it's going to interfere with our data analysis. What we really want to see is the information in Row 5 as our header name and the information from Row 6 onwards to be our data.
Figure 1: The data in rows 1-4, as seen in Excel, should not be included in the data analysis.
Rather than manually re-format our dataset, we'll bring it into Alteryx and let the data preparation begin! Using an Input Tool, we'll navigate to the location of our data file. The tool gives us a preview of what to expect when bringing in the data (Figure 2). This format is nowhere near perfect, but we still have a few tricks up our sleeve!
Figure 2: The Input Tool shows how the data will be brought into Alteryx. Our heading is not correct, and we still have a few lines of data (in light yellow) to eliminate while keeping the data we want to analyze (in dark yellow).
A quick visual assessment indicates that we'll need to skip the first three rows of data (the information in Row 4 will become our field names). We can remove these data using a Sample Tool. In the Sample Tool configuration (Figure 3), we'll opt to "Skip the 1st N Records"; in this case, N will be equal to 3.
Figure 3: Set the number of records to skip, or remove, from the top of the dataset.
Now that we've removed the first 3 rows of data, we are much closer to the version of the data format we'd like to work with. The data we'd like to use as the field names (Number, FirstName and State) are now in the first row of data. We'll use the Dynamic Rename Tool to re-name our fields using the option to "Take Fields from the First Row of Data" (Figure 4). And, voila!! Our data is now ready to use for the next steps of our analyses.
Figure 4: After removing unwanted rows of data and re-naming the fields, our data is ready for further analyses.
*See the attached sample workflow (v10.5) for an example of this process.
Binary (bit level or bitwise) operations operate on one or more bit patterns or binary numerals at the level of their discrete bits. They are typically used to manipulate values for comparisons and calculations - and encryption!
These functions are in Formula Tool, in the Integer submenu of the Math menu:
Binary values are stored as strings, retaining the leading zeros. To use these functions, you'll have to convert these to numerals. Use the BinToInt() function to convert your binary strings to numeric values. Once you've completed your calculations, use IntToBin() to get the binary values. Note: you'll need to add your leading zeros back in, using the PadLeft function.
If you need the character associated with the binary value, use CharToInt(). Hex calculations work similarly.
BinaryAnd(n,m) – performs the logical AND operation on two binary numerals
BinaryNot(n) – performs logical negation, forming the complement (bits that are 0 become 1, and those that are 1 become 0)
BinaryXOr(n,m) - exclusive disjunction essentially means 'either one, but not both nor none'. In other words, the statement is true if and only if one is true and the other is false.
A common use for XOR is a means of doing a parity check. A bitstring has even parity if the number of 1s in the string is even. It has an odd parity if the number of 1s is odd. If you XOR the bits together, you can tell whether a bitstring has even or odd parity.
ShiftLeft(n,b) / ShiftRight(n,b) - shifting left is equivalent to multiplication by powers of 2. So 5 << 1 is equivalent to 5 * 2, and 5 << 3 is equivalent to 5 * 8. Shifting to the right is equivalent to division by powers of 2.
Please see the attached v10.5 workflow for a simple secret message conversion.
The Association Analysis Tool allows you to choose any numerical fields and assesses the level of correlation between those fields. You can either use the Pearson product-moment correlation, Spearmen rank-order correlation, or Hoeffding's D statistics to perform your analysis. You can also have the option of doing an in-depth analysis of your target variable in relation to the other numerical fields. After you’ve run through the tool, you will have two outputs:
The Multi-Field Formula Tool offers the same functionality as the Formula Tool, but offers the added benefit of applying a function across multiple fields of data all at once. Gone are the days of writing the same function for multiple fields.
Say there are four fields with dollar signs ($) that need to be removed. It could be done with a Formula Tool and a function written for each field:
You have a dataset containing information on customers’ survey responses (Y/N), the Customer Segment (Corporate, Small Business, etc) to which they belong, and other location data. You have been tasked with finding the percent of each Responder type in the entire data set. To perform these calculations, we’ll need two types of counts of data. The first is a “conditional count”, or a count of records that meet certain criteria. The second is a count of all records in a dataset. Alteryx has two nifty ways to help us obtain these values. We’ll use both the Count Records and Summarize tools to help us with these tasks!
Use the Summarize Tool’s “Count” function
The Summarize tool allows us to count the number of records that meet certain criteria. For our particular examples, we want to find the number of records for each Responder Type, Yes or No. We’ll use the Summarize tool to Group by “Responder”. Then, we’ll Count the number of Customer IDs for each Responder type (Figure 1).
Figure 1: The Summarize tool will Count all records, Count NonNull records, CountDistinct (Unique) records and CountDistinct (NonNull) records.
Want to drill down in your data even more? How about find the number of Responder Types per Customer Segment? Again, the Summarize tool can help! Group by “Customer Segment”, then by “Responder”, then Count the “Customer IDs”. See the attached workflow to see this example in action.
Use the Count Records Tool
To calculate the percent of each response type for our entire dataset, we’ll need to know the total number of responders in our dataset. While there are a few ways to go about getting that number, I’ll highlight the use of the Count Records Tool (well, macro, technically). “The WHAT?” you ask? I’ve heard that before.
The Count Records Tool. It’s easy to miss and I can count the number of times I’ve seen this tool in a user's workflow on one hand. However, it’s one of those tools for which you quickly find so many uses! It does exactly what its name suggests: it counts the number of records from an incoming data stream. The tool itself requires no configuration. Simply insert the tool into your workflow and receive a count of the number of records in the incoming dataset, even if there are zero records from incoming data* (Figure 2):
*The Count Records tool will return a record with value of 0, whereas the Summarize tool will simply not return any records.
Figure 2: The Count Records tool has no configuration and returns the number of records (rows) from an incoming data stream.
Now that we have the counts that we need for our calculations, we're ready to move forward with our data analysis! Please see the attached workflow for the completed demonstration of this process.
Often times in data preparation, the need for order in your records will arise. When that situation occurs, the Sort Tool has your back. It’s just that sort of tool. Effortlessly arranging your records – be it alphabetical, numeric, or chronological in order – while not quite a mind-numbingly complex operation, has ample utility. Sorting your records upstream of many tools can even optimize processing time . The fairly simple use cases below are techniques that frequently pop up in the data blending trenches:
You monitor the mileage of multiple trucks as they deliver shipments over the course of a week and record additional information regarding each truck in a file (Truck Metrics). Each truck’s cumulative mileage per day is recorded in a separate file (Truck Mileage). Your goal is to update the mileage-related fields in Truck Metrics with the values recorded in Truck Mileage. Note: today is Tuesday so only fields for Monday and Tuesday will be updated in in the Truck Metrics file.
Manually Select Fields to Update
Whether the data is uniquely identified by a Truck Number (or Record ID) or identically ordered in both files, data from Truck Metrics (the table to be updated) and Truck Mileage (the data used for updating) can be Joined together. Then, using the Select functionality within the Join tool, I can manually choose the fields from Truck Mileage that I want to use to replace the fields that need to be updated (Figure 1).
Figure 1: Fields from Truck Mileage (Yellow) replace fields from Truck Metrics (Blue). Note that fields in yellow are selected while fields in blue have been deselected. Fields that need to be included from Truck Metrics (Red) remain selected.
Fantastic! A simple, straightforward way to update fields! But, as any analyst knows, working with data is rarely simple or straightforward. What if you’re dealing with 20 fields that need to be updated, not just 2? In that case, manually selecting fields to update is not only tedious but also error-prone. For these types of situations, I recommend a process that allows for a more dynamic approach.
'Verticalize' the Data to Dynamically Update Fields
Transposing, or ‘verticalizing’ data, allows for a more dynamic workflow when you have unknowns in your processing. In a scenario such as this one, you may have an unknown or changing number of fields that will need be updated in Truck Metrics. Using this approach, we’ll first Transpose both data sets to configure the Field Name and its associated value in a single row (Figure 2).
Figure 2: The data is transposed from Truck Mileage and Truck Metrics. The highlighted fields in Truck Metics indicates which fields will be updated, as the same field exists in Truck Mileage.
Then, we’ll Join our datasets based on two fields: Truck Number AND Name. This ensures that fields in Truck Mileage will match to the correct fields in Truck Metrics, assuming the fields names in both tables are named in the same way. The only selecting we’ll have to do is to make sure all fields from the Right data source (in this case, Truck Metrics) are deselected (Figure 3). This allows that, in the situation of matched Truck Numbers and field names, the updated values will be used.
Figure 3: The Joined fields indicate the fields that exist in both Truck Mileage and Truck Metrics. Fields in yellow (Truck Mileage) are selected to reflect updated values in downstream tools. Fields in blue (Truck Metrics) are deselected.
Note that any unmatched fields have fallen out of the Right side of the Join:
To add them back into the data stream, simply Union the Center and Right Joins together, setting the tool to “Auto Configure by Name”. Then, to rearrange the data into its original format, use the Cross Tab tool (Figure 4). And, voila! Your data is updated!
Figure 4: The updated fields are highlighted in yellow. All other fields originally included in Truck Metrics are included the in the final results as well.
Recently a couple questions came across the Customer Support desk asking how a fiscal calendar could be incorporated into a workflow. Natively Alteryx doesn’t have a tool to create one, but Alteryx does have a number of tools to make a fiscal calendar. Here is an example of how this can be done.
1. Determine the start and end dates of the fiscal calendar and enter them into Text input tool, each on an individual row
2. Connect a TS Filler tool to generate dates between the start and end dates
3. A Select Tool was added to remove unnecessary fields
4. Add a Multi-Row Formula Tool to create a Day of Week field by assigning each row a day of the week from 1 to 7
5. Add another Multi-Row Formula Tool to calculate the fiscal week. Whenever the Day of Week is 1 add a value of 1 to the previous row’s fiscal week value. This will create a running week value for the entire year
An example workflow is attached. Also in example workflow is an example of how the fiscal month and week of month may be added. (Workflow is in Alteryx 10.6 version.)
Question I have a table of sales data with each column being a week's worth of sales. I only want records that have data in each of those fields and want to filter out all records that have Null values. How can I do this?
Answer There are two basic elements necessary to make this happen. The first is that all records in the original table have a unique ID. If you do not have a unique ID in your data, go ahead and add a Record ID Tool.
In the sample data you can see we will want data from Rows 1 and 6 while filtering out each of the other records because they contain null values.
From here we will use the Transpose Tool to pivot your data into 3 separate columns. In the transpose field choose your unique ID as the KEY FIELD and make sure all other records are selected as DATA FIELDS.
The result is that you will have your unique ID field, a field called [Name] which contains the names of each of the fields in your data, repeated for every unique ID in your original data, and a [Value] field which contains the individual values for each of the records for each of the columns in the original data.
Now we want to search for Nulls, and get a comprehensive list of the UniqueID values that do not contain Null values. Now is the time to bring in a Summarize tool and GroupBy your unique ID field, and then use the CountNull action.
The result is a list of how many nulls exist in each of your unique ID groups.
Next we can simply filter out the fields that have 0 null values in them and then use the unique IDs to join back to the original data, and pull only those records.
It's important to note here that because I'm only interested in the original fields I intentionally chose to deselect the unique ID and the Null Count fields from the output of the join so that I am left with only those records that have data in all of the weeks.
See the attached v10.5 workflow for an example of the approach above.
Question How do I remove whitespace from my data?? Help!
Answer There are a couple of different whitespace situations you might get yourself into, but the schematic below (from the attached v10.6 example Remove Whitespace.yxmd) has you covered in all of them:
Most of the approaches make use of the Formula Tool’s trim() function which, without a second argument, defaults to trimming whitespace from your strings. Post Designer version v10.5 you can also use the Data Cleansing Tool to clean your fields for you! Master it here.