This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
The Sample Tool allows you selectively pass patterns, block excerpts, or samples of your records (or groups of records) in your dataset: the first N, last N, skipping the first N, 1 of every N, random 1 in N chance for each record to pass, and first N%. Using these options can come in the clutch pretty often in data preparation – that’s why you’ll find it in our Favorites Category, and for good reason. While a great tool to sample your data sets, you can also use it for:
The consummate Reporting Tool , the Layout Tool isn’t named after its ability to “layout” your reporting work like Terry Tate, Office Linebacker . No, the name is borrowed from the tool’s core functionality to freely arrange reporting objects in your workflows to fit a templated, “layout,” format to be output by the Render Tool . Giving you the flexibility to organize reporting snippets horizontally or vertically, set the width and height of columns and rows, and to add borders and separators, the Layout Tool is the last requisite step in compiling, formatting, and generating a report in Alteryx - probably to never be made by hand again!
We frequently get questions about how to flag rows in a data set that are missing values in any column.
In the example below, we will demonstrate how to add a new column to the data that gives a count of null or empty values per row. The end result will give a count of 1 to each of the highlighted rows in the image. The new column can be used in a filter tool to isolate rows of data that have missing values.
You've been given data for a new project and it contains lots of extra (and unnecessary) rows before you even get to the information you need to work with. Look familiar?
For many Alteryx users, this situation is all too common. Luckily, there's a pretty easy way to resolve this issue using the Sample and Dynamic Rename tools!
To demonstrate this approach, we'll use some sample data that has extraneous information and space at the top (Rows 1-4) of the spreadsheet in Figure 1 (below). While the information itself might be important, it's going to interfere with our data analysis. What we really want to see is the information in Row 5 as our header name and the information from Row 6 onwards to be our data.
Figure 1: The data in rows 1-4, as seen in Excel, should not be included in the data analysis.
Rather than manually re-format our dataset, we'll bring it into Alteryx and let the data preparation begin! Using an Input Tool, we'll navigate to the location of our data file. The tool gives us a preview of what to expect when bringing in the data (Figure 2). This format is nowhere near perfect, but we still have a few tricks up our sleeve!
Figure 2: The Input Tool shows how the data will be brought into Alteryx. Our heading is not correct, and we still have a few lines of data (in light yellow) to eliminate while keeping the data we want to analyze (in dark yellow).
A quick visual assessment indicates that we'll need to skip the first three rows of data (the information in Row 4 will become our field names). We can remove these data using a Sample Tool. In the Sample Tool configuration (Figure 3), we'll opt to "Skip the 1st N Records"; in this case, N will be equal to 3.
Figure 3: Set the number of records to skip, or remove, from the top of the dataset.
Now that we've removed the first 3 rows of data, we are much closer to the version of the data format we'd like to work with. The data we'd like to use as the field names (Number, FirstName and State) are now in the first row of data. We'll use the Dynamic Rename Tool to re-name our fields using the option to "Take Fields from the First Row of Data" (Figure 4). And, voila!! Our data is now ready to use for the next steps of our analyses.
Figure 4: After removing unwanted rows of data and re-naming the fields, our data is ready for further analyses.
*See the attached sample workflow (v10.5) for an example of this process.
Binary (bit level or bitwise) operations operate on one or more bit patterns or binary numerals at the level of their discrete bits. They are typically used to manipulate values for comparisons and calculations - and encryption!
These functions are in Formula Tool, in the Integer submenu of the Math menu:
Binary values are stored as strings, retaining the leading zeros. To use these functions, you'll have to convert these to numerals. Use the BinToInt() function to convert your binary strings to numeric values. Once you've completed your calculations, use IntToBin() to get the binary values. Note: you'll need to add your leading zeros back in, using the PadLeft function.
If you need the character associated with the binary value, use CharToInt(). Hex calculations work similarly.
BinaryAnd(n,m) – performs the logical AND operation on two binary numerals
BinaryNot(n) – performs logical negation, forming the complement (bits that are 0 become 1, and those that are 1 become 0)
BinaryXOr(n,m) - exclusive disjunction essentially means 'either one, but not both nor none'. In other words, the statement is true if and only if one is true and the other is false.
A common use for XOR is a means of doing a parity check. A bitstring has even parity if the number of 1s in the string is even. It has an odd parity if the number of 1s is odd. If you XOR the bits together, you can tell whether a bitstring has even or odd parity.
ShiftLeft(n,b) / ShiftRight(n,b) - shifting left is equivalent to multiplication by powers of 2. So 5 << 1 is equivalent to 5 * 2, and 5 << 3 is equivalent to 5 * 8. Shifting to the right is equivalent to division by powers of 2.
Please see the attached v10.5 workflow for a simple secret message conversion.
If you're like me, you may find RegEx a bit intimidating. While it can be confusing, once you get a handle on it you'll see just how powerful it actually is. Below is a post from the Discussion boards that should make an introduction into RegEx less daunting. The full post is below, and I've attached the sample workflow to this post as well.
A lot of data prep questions I get asked about seem to come back to the same answer - "Use Regex!"
However, RegEx isn't exactly easy to understand for someone who has never come across it before.
This is how I started out, but by dissecting other peoples' uses of RegEx, I started to understand how RegEx works and now I seem to use it in pretty much every workflow I create (slight exaggeration).
So I decided to upload the attached workflow that has 12 use cases for RegEx for you to use, adapt, learn from and improve on. Please download the attached Alteryx workflow and explore what's going on and you'll soon be more than comfortable with RegEx too!
I've also put some useful RegEx links on the original blog for this workflow here: http://www.theinformationlab.co.uk/2016/10/27/regex-practice-using-alteryx/
Feel free to upload your use cases for RegEx on this discussion post, along with examples, and perhaps we can start to create a small repository for everyone - from RegEx newbies to those highly skilled in this complex but extremely useful tool.
The attached workflow is in v10.5. Refer to our RegEx Tool Mastery if you'd like to learn more about the tool and its many uses!
You are in charge of marketing for Joe’s Bike Shop, a Colorado-based chain of bicycle shops. A new list of potential customers (DemandLoc.yxdb) has come into the system and you want to be sure that the closest location of a Joe’s Bike Shop (ExistingLoc.yxdb) within a reasonable driving distance (in this case, 20 miles) be advertised to the Potential Customer.
As always with Alteryx, there are many ways to approach this problem. Since our files of Potential Customers and existing locations of Joe's Bike Shop contain spatial objects (points), we can take advantage of the Find Nearest Tool. Though a number of other spatial processes could be used to handle this analysis, this scenario is a great opportunity to use the Find Nearest Tool because it:
includes two anchors for data input, which removes the obstacle of complex data arranging;
Accepts a distance criteria (Joe's Bike Shop must be within 20 miles of our potential customer), which eliminates the need to create Trade Areas around my existing location of Joe’s Bike Shops;
Calculates distance (or Drivetime) and direction, metrics I’d normally use the Distance tool to obtain and;
Allows a user to specify the number of closest spatial objects to identify, making the ol’ Sort and Sample trick unnecessary!
Let’s get started!
We have two input data files, the locations of Potential Customers (DemandLoc.yxdb, 3294 records) and the existing locations of Joe’s Bike Shops (ExistingLoc.yxdb, 6 records). Since we want to find the closest (within 20 miles) Joe’s Bike Shop location to a Potential Customer, we’ll set the Potential Customers file as the Target Input and the Existing Locations as the Universe input:
Then, we’ll configure the Find Nearest Tool’s parameters. Select the Spatial Object field for the Target and Universe inputs. Determine the number of points to find and specify the maximum distance* between a Target and Universe point. Note that the Find Nearest Tool also allows us the opportunity to re-name and re-order fields from incoming data, as well as select/de-select new fields generated in the Find Nearest Tool (FindNearestRank, DistanceMiles, and Direction).
*Users with the Spatial Data Bundle may also select Drive Time or Distance and the TomTom dataset to calculate.
The Find Nearest Tool has two output anchors, one for matched (M) Target inputs and another for unmatched (U) Target inputs. According to our analysis, 2341 of our 3294 potential customers have a Joe’s Bike Shop location within 20 miles of their location. The remaining 953 Potential Customers are outside the 20 mile distance.
In the event that two Universe objects are the same distance from a Target object, the Find Nearest Tool will output multiple matches for a Target Object. Since we specified that one Joe’s Bike Shop should be found per Potential Customer, we’ll verify that a Potential Customer has not matched to more than one shop location using a Unique Tool (see attached workflow).
Looking to optimize your Find Nearest Workflow? Check out this article for tips!
Question If I have a workflow that creates a set of data by region, how can I group the data by region and then send each region's data separately to each of the region's Vice President, all within one workflow?
Answer There are several ways to do this, depending on what form you want the reports to take. The attached workflow, created in 10.6, gives you three examples as described below. Please note that you will need to populate the second Text Input Tool and the "From" in the Email Tool with valid email addresses in order to test.
1. Include Data as a Table within the Body of Email
If you are dealing with a small set of records and columns, this might be your best bet. This approach will insert a table into the body of the email, giving the recipient immediate access to the data without having to open an attachment.
The email will look similar to this:
2. Create Flat Files and Attach to Emails
This option is best if you simply need to send specific datasets to specific recipients and the report doesn't require any formatting. Here, you are simply creating files names based on Region, writing out the data by Region (using the "Take File/Table Name From Field" option in the Output Data Tool), and then summarizing the recipient and file name data so that only one email is sent per group.
The Excel files will look like this:
3. Create a Formatted Excel Table and Attach to Email
This one is similar to the previous workflow, except we are creating a formatted table using the Table Tool, and then writing out the table to Excel using the Render Tool, in order to keep the formatting. In the Render Tool, we are using the "Group Data Into Separate Reports" feature.
In this example, the Excel files will look like this:
I hope you will find this useful for your use case. Thank you for reading!
VLOOKUP is one of the most commonly used functions is Excel. VLOOKUP takes a lookup value and finds that value in the first column of a lookup range. Complete the function's syntax by specifying the column number to return from the range. In other words, VLOOKUP is a join. One column of data is joined to a specified range to return a set of values from that range.
The objective of this posting is twofold:
1. Demonstrate how a VLOOKUP can be done using Alteryx.
2. Demonstrate how a VLOOKUP can actually be easier to do in Alteryx.
For the sake of argument, let's say we have two worksheets in one Excel file. Sheet 1 contains a master list of sales data. The data looks like this:
Sheet 2 contains a select list of sales people and contains First Name and Last Name:
Using a VLOOKUP function, we want to take the list of sales people, match it to sales people found in Sheet 1 and return sales data for each sales person. Notice our list of sales people in Sheet 2 has two sales people named 'Angelina'. Sheet 1 has multiple people named 'Angelina' and 'Michael'. Using First Name alone will not return reliable results using VLOOKUP. Further, matching on Last Name - or even a concatenation of both First Name and Last Name - isn't always reliable either. There are a lot of 'Michael Thomas's in the world and several may exist in your data range. The best way to complete our VLOOKUP is to use a unique identifier for each sales person. But again, for the sake of argument, let's say each First Name and Last Name combination produces a unique name/identifier.
The first thing we need to do is do is concatenate First Name and Last Name in a new field.
We'll do the same thing with our data in Sheet 2. For VLOOKUPs to work properly, the data must be sorted. In our case, we'll sort both Sheet 1 and Sheet 2 in ascending order on 'Concatenated Name':
Now we're ready to use our =VLOOKUP function:
It worked! Let's see how do the same thing in Alteryx. (Hint: there's more than just one way to do it).
We'll start by bringing in our data from Sheets 1 and 2:
As with the Excel example, we'll concatenate First Name and Last Name in a new field. For clarity, let's name the concatenated name in Sheet 1 'Full Name' and Sheet 2 'New Name'. We'll do this using a Formula Tool expression:
Finally, let's attach a Find Replace Tool where Sheet 2 attaches to the 'F' input of Find Replace and Sheet 1 attaches to the 'R'.
The configuration for the Find Replace looks like this:
Notice 'Sales' is selected under 'Append Field(s) to Record'. When we run the workflow, we get the sales data by sales person just like we did in Excel:
But there's an even simpler way to perform a VLOOKUP in Alteryx! Simply join the two Sheets with a Join Tool.
Take a look at how the Join is configured:
We joined our data on First Name and Last Name without having to concatenate the two fields first. Further, we selected the sales data we want returned in the same Join Tool.
Alteryx has an added advantage over Excel's VLOOKUP function. Let's say you want to return Sales and Opportunities. With Alteryx, this can be handled in a single Join. With Excel, we would need to do multiple VLOOKUPs.
A truck is scheduled to deliver shipments to warehouses along a route spanning from Denver, CO to Abilene, TX. Deliveries are scheduled for certain days, so the route must be followed in the specified order (1 – 15). To spatially analyze the delivery truck’s route and area served by this truck’s shipments, we want to map:
1) the truck’s end-to-end route, and
2) the truck’s daily scheduled route.
To create the spatial objects (polylines) associated with the truck's routes, we'll use the Poly-Build tool. Located in the Spatial tool palette, the Poly-Build tool uses two or more point spatial objects and creates sequenced polyline or polygon to represent that data. For this particular use case, we’ll utilize the “Sequence Polyline” Build Method to vizualize and analyze our data.
1) Map the truck's end-to-end route
To map the truck’s entire delivery route, we’ll use the PolyBuild tool to generate a polyline that connects stops 1 to 15 in order. To control the order in which the spatial point objects are connected, we’ll use the field ‘Trip Sequence’ as the Sequence Field in the tool’s configuration:
The result is a single polyline that represents the 1,925 mile long route that the delivery truck will cover on its scheduled route:
2) Map each of the four daily delivery routes:
After some data preparation*, we’ll use a Multi-Row Formula tool to first group the truck’s stops into four segments, one for each day of the shipment schedule. Then, we’ll use the Poly-Build tool to create polylines for each of the day’s routes. Again, we'll select Sequence Polyline as our Build Method and 'Trip Sequence' as the Sequence Field. To create multiple polylines, one for each day, we’ll also select to group our data by the field 'Group'.
*See attached workflow for steps taken to prepare data for daily truck route creation.
The result is four polylines that represent each of the four daily segments of the truck's scheduled delivery route.
I have had several questions from clients over the last few weeks looking to use Alteryx as the ultimate ‘middle man’ between databases and their end output, without explicitly writing anything to memory in the process. Alteryx will happily oblige in this domain whilst also providing seamless data blending & advanced analytics in the processes.
Here are some potential ways you can achieve this goal:
Using our In-Database tools means you never actually bring data down into memory but when you are ready you can use our Data Stream Out Tooland seamlessly pass this into our analytic tool set or to your output location.
No Browse Tools
In addition to the above point if you choose not to add Browse Tools, Alteryx will only cache around 1 MB of data at each tool in a temp location. This temp location is then cleared when you close Alteryx. Therefore, it is only kept in memory for the duration of the development of the workflow rather than indefinitely. Your default temp location can be found in Workflow Properties.
Changing workflows or Applications
When writing out data you need not write to hard-coded paths - you can reference “%temp% in the file path. This will then write to the default location set in workflow properties outlined above. You can then reference this file location in the next workflow by using a combination of the Directory, Sort, Sample, and the Dynamic Input Tools to read in that file. Alteryx will as default write an Alteryx Engine file for each run. Using the above tools will allow you to dynamically read in the latest file and data. The bonus is that these engine files get cleared out on a scheduled basis so the cached data will not exist in memory over time. The workflow depicting this (attached) was built in Alteryx 10.6.
Output to the database or via one of our connectors which use an API
You can utilize the above method mentioned in ‘No Browse Tools’ however at the end of your workflow you can output directly to a database using one of our connectors via an API. As Alteryx and Tableau work together, often clients will use the Publish to Tableau Server Macro to take data directly from an Alteryx workflow and up to tableau without keeping any data in memory. Again the 1MB of cached data will be removed from the Alteryx engine files and Alteryx pushes the data via a ‘POST’ command directly to the Tableau Server.
You have a dataset containing information on customers’ survey responses (Y/N), the Customer Segment (Corporate, Small Business, etc) to which they belong, and other location data. You have been tasked with finding the percent of each Responder type in the entire data set. To perform these calculations, we’ll need two types of counts of data. The first is a “conditional count”, or a count of records that meet certain criteria. The second is a count of all records in a dataset. Alteryx has two nifty ways to help us obtain these values. We’ll use both the Count Records and Summarize tools to help us with these tasks!
Use the Summarize Tool’s “Count” function
The Summarize tool allows us to count the number of records that meet certain criteria. For our particular examples, we want to find the number of records for each Responder Type, Yes or No. We’ll use the Summarize tool to Group by “Responder”. Then, we’ll Count the number of Customer IDs for each Responder type (Figure 1).
Figure 1: The Summarize tool will Count all records, Count NonNull records, CountDistinct (Unique) records and CountDistinct (NonNull) records.
Want to drill down in your data even more? How about find the number of Responder Types per Customer Segment? Again, the Summarize tool can help! Group by “Customer Segment”, then by “Responder”, then Count the “Customer IDs”. See the attached workflow to see this example in action.
Use the Count Records Tool
To calculate the percent of each response type for our entire dataset, we’ll need to know the total number of responders in our dataset. While there are a few ways to go about getting that number, I’ll highlight the use of the Count Records Tool (well, macro, technically). “The WHAT?” you ask? I’ve heard that before.
The Count Records Tool. It’s easy to miss and I can count the number of times I’ve seen this tool in a user's workflow on one hand. However, it’s one of those tools for which you quickly find so many uses! It does exactly what its name suggests: it counts the number of records from an incoming data stream. The tool itself requires no configuration. Simply insert the tool into your workflow and receive a count of the number of records in the incoming dataset, even if there are zero records from incoming data* (Figure 2):
*The Count Records tool will return a record with value of 0, whereas the Summarize tool will simply not return any records.
Figure 2: The Count Records tool has no configuration and returns the number of records (rows) from an incoming data stream.
Now that we have the counts that we need for our calculations, we're ready to move forward with our data analysis! Please see the attached workflow for the completed demonstration of this process.
Let's start with the basics of how to create a report map in Alteryx. To start off, ensure that the layers you want to show in your map have a spatial object field. This can be checked by placing a select tool and confirming that there is a column of type 'SpatialObj.'
With the Input Data Tool (master it here ), our Connector Tools , and the Download Tool (master it here ), data in the Designer is aplenty. But what about manually entered user data? In analytics, we’re often trying to avoid human-entered data (unless we’re cleaning it) because it is more prone to error. In spite of the risks, there are a number of situations that manually entered data can be useful in Alteryx. However, it’s dangerous to go alone; take the Text Input Tool and simplify those instances with the techniques below:
Often times in data preparation, the need for order in your records will arise. When that situation occurs, the Sort Tool has your back. It’s just that sort of tool. Effortlessly arranging your records – be it alphabetical, numeric, or chronological in order – while not quite a mind-numbingly complex operation, has ample utility. Sorting your records upstream of many tools can even optimize processing time . The fairly simple use cases below are techniques that frequently pop up in the data blending trenches:
The Message Tool within the Alteryx Designer is your own personal car alarm. This tool can provide you warnings or errors when your data doesn't meet a user-specified criteria or it can set up to tell you when data does not match.
The Message Tool can be set up to pick up records before, during and after the records have passed through the tool itself. This makes it useful for evaluating your dataset at different parts of your workflow.
Easily the most used tool in the parsing category , the Text To Columns Tool makes for an extremely quick dicing of delimited fields. To use it you only need to specify a delimited field, delimiter(s), whether you’re parsing to rows or columns (you’ll need to specify a number of columns to parse into with this selection) and you’re off. Any way you slice it, this tool has you covered:
You monitor the mileage of multiple trucks as they deliver shipments over the course of a week and record additional information regarding each truck in a file (Truck Metrics). Each truck’s cumulative mileage per day is recorded in a separate file (Truck Mileage). Your goal is to update the mileage-related fields in Truck Metrics with the values recorded in Truck Mileage. Note: today is Tuesday so only fields for Monday and Tuesday will be updated in in the Truck Metrics file.
Manually Select Fields to Update
Whether the data is uniquely identified by a Truck Number (or Record ID) or identically ordered in both files, data from Truck Metrics (the table to be updated) and Truck Mileage (the data used for updating) can be Joined together. Then, using the Select functionality within the Join tool, I can manually choose the fields from Truck Mileage that I want to use to replace the fields that need to be updated (Figure 1).
Figure 1: Fields from Truck Mileage (Yellow) replace fields from Truck Metrics (Blue). Note that fields in yellow are selected while fields in blue have been deselected. Fields that need to be included from Truck Metrics (Red) remain selected.
Fantastic! A simple, straightforward way to update fields! But, as any analyst knows, working with data is rarely simple or straightforward. What if you’re dealing with 20 fields that need to be updated, not just 2? In that case, manually selecting fields to update is not only tedious but also error-prone. For these types of situations, I recommend a process that allows for a more dynamic approach.
'Verticalize' the Data to Dynamically Update Fields
Transposing, or ‘verticalizing’ data, allows for a more dynamic workflow when you have unknowns in your processing. In a scenario such as this one, you may have an unknown or changing number of fields that will need be updated in Truck Metrics. Using this approach, we’ll first Transpose both data sets to configure the Field Name and its associated value in a single row (Figure 2).
Figure 2: The data is transposed from Truck Mileage and Truck Metrics. The highlighted fields in Truck Metics indicates which fields will be updated, as the same field exists in Truck Mileage.
Then, we’ll Join our datasets based on two fields: Truck Number AND Name. This ensures that fields in Truck Mileage will match to the correct fields in Truck Metrics, assuming the fields names in both tables are named in the same way. The only selecting we’ll have to do is to make sure all fields from the Right data source (in this case, Truck Metrics) are deselected (Figure 3). This allows that, in the situation of matched Truck Numbers and field names, the updated values will be used.
Figure 3: The Joined fields indicate the fields that exist in both Truck Mileage and Truck Metrics. Fields in yellow (Truck Mileage) are selected to reflect updated values in downstream tools. Fields in blue (Truck Metrics) are deselected.
Note that any unmatched fields have fallen out of the Right side of the Join:
To add them back into the data stream, simply Union the Center and Right Joins together, setting the tool to “Auto Configure by Name”. Then, to rearrange the data into its original format, use the Cross Tab tool (Figure 4). And, voila! Your data is updated!
Figure 4: The updated fields are highlighted in yellow. All other fields originally included in Truck Metrics are included the in the final results as well.
Recently a couple questions came across the Customer Support desk asking how a fiscal calendar could be incorporated into a workflow. Natively Alteryx doesn’t have a tool to create one, but Alteryx does have a number of tools to make a fiscal calendar. Here is an example of how this can be done.
1. Determine the start and end dates of the fiscal calendar and enter them into Text input tool, each on an individual row
2. Connect a TS Filler tool to generate dates between the start and end dates
3. A Select Tool was added to remove unnecessary fields
4. Add a Multi-Row Formula Tool to create a Day of Week field by assigning each row a day of the week from 1 to 7
5. Add another Multi-Row Formula Tool to calculate the fiscal week. Whenever the Day of Week is 1 add a value of 1 to the previous row’s fiscal week value. This will create a running week value for the entire year
An example workflow is attached. Also in example workflow is an example of how the fiscal month and week of month may be added. (Workflow is in Alteryx 10.6 version.)
Question I have a table of sales data with each column being a week's worth of sales. I only want records that have data in each of those fields and want to filter out all records that have Null values. How can I do this?
Answer There are two basic elements necessary to make this happen. The first is that all records in the original table have a unique ID. If you do not have a unique ID in your data, go ahead and add a Record ID Tool.
In the sample data you can see we will want data from Rows 1 and 6 while filtering out each of the other records because they contain null values.
From here we will use the Transpose Tool to pivot your data into 3 separate columns. In the transpose field choose your unique ID as the KEY FIELD and make sure all other records are selected as DATA FIELDS.
The result is that you will have your unique ID field, a field called [Name] which contains the names of each of the fields in your data, repeated for every unique ID in your original data, and a [Value] field which contains the individual values for each of the records for each of the columns in the original data.
Now we want to search for Nulls, and get a comprehensive list of the UniqueID values that do not contain Null values. Now is the time to bring in a Summarize tool and GroupBy your unique ID field, and then use the CountNull action.
The result is a list of how many nulls exist in each of your unique ID groups.
Next we can simply filter out the fields that have 0 null values in them and then use the unique IDs to join back to the original data, and pull only those records.
It's important to note here that because I'm only interested in the original fields I intentionally chose to deselect the unique ID and the Null Count fields from the output of the join so that I am left with only those records that have data in all of the weeks.
See the attached v10.5 workflow for an example of the approach above.