community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx Designer Knowledge Base

Definitive answers from Designer experts.
Upgrade Alteryx Designer in 10 Steps

Debating whether or not to upgrade to the latest version of Alteryx Designer?

LEARN MORE
Scenario: You have multiple fields in your data that correspond to customer addresses.  Some customers may have more than one address listed in their record.  However, you want to whittle that list to one address per customer.  That one address is the first, non-null address found when moving sequentially across a set of fields that contain address data. For our example, we have three fields of data related to addresses: Address1, Address2 and Address3.  The preferred field of data to use is Address1.  However, if Address1 does not contain data, then we’ll use the data in Address2.  If both fields of Address1 and Address2 do not have data, then we’ll use the data in Address3.  The final output should contain a single list of the addresses highlighted in yellow in Figure 1.         Figure 1: The final output will contain a single list of the data highlighted in yellow.  Method 1: Write a Conditional Statement The most common approach to this type of problem is to craft a conditional statement to select data based on a set of criteria (in this case, order).  In the case of our data, it would look something like this: IF IsNull([Address1]) AND IsNull([Address2]) THEN [Address3] ELSEIF IsNull([Address1]) THEN [Address2] ELSE [Address1] ENDIF   However, what if I had 20 fields of addresses instead of 3?  Just getting that statement with three fields took me too long to write out!  If you do have 20 fields, you might want to start typing that Formula now….   IF IsNull([Address1] AND IsNull([Address2]) AND IsNull([Address3]) AND IsNull([Address4]) AND IsNull([Address5]) AND IsNull([Address6]) AND IsNull([Address7]) AND IsNull([Address8])...   You get the idea.  And now you’re thinking, “You’re going to tell me there’s a better way, right?!?”  Well, yes...I am!     Method 2: Data Manipulation An alternative method of solving this problem is to manipulate the data using the Transpose, Filter and Sample tools.  I’ll share some advice from @RodL here: “If you want to make something…truly dynamic, then the ‘best practice’ when you are dealing with an unknown number of columns is to ‘verticalize’ the data”.  In our case, we may know the total number of columns of address data we have; what we don’t know is which column the data we want is actually in.    Following @RodL’s suggestion, we’ll ‘verticalize’ the addresses using the Transpose tool.  This stacks the addresses for each customer in order of the fields in the table (Figure 2).  We’ll use the Client ID (or Record ID, if you’ve chosen to add one) as our Key Field and the fields that contain address data as our Data Fields.     Figure 2: All address fields per Record ID (or Client ID) are stacked vertically in order of the field sequence.   Since Null values are not usable records for our purposes, we’ll use a Filter to identify all the usable data (non-Null values).  Now that our usable data is stacked vertically in order of field selection, we can Sample the first record from each Record ID (or Client ID) group.  We’ll configure the Sample tool to identify the First N (where N = 1) Records from a group (Group by Record ID or Client ID).   Figure 3: Sample the first record from every Record ID or Client ID group.      After some data clean-up with a Select tool, we're left with a column of the selected address for each of our customers:          
View full article
Scenario: You monitor the mileage of multiple trucks as they deliver shipments over the course of a week and record additional information regarding each truck in a file (Truck Metrics).  Each truck’s cumulative mileage per day is recorded in a separate file (Truck Mileage).  Your goal is to update the mileage-related fields in Truck Metrics with the values recorded in Truck Mileage.  Note: today is Tuesday so only fields for Monday and Tuesday will be updated in in the Truck Metrics file.         Methods:  Manually Select Fields to Update Whether the data is uniquely identified by a Truck Number (or Record ID) or identically ordered in both files, data from Truck Metrics (the table to be updated) and Truck Mileage (the data used for updating) can be Joined together.  Then, using the Select functionality within the Join tool, I can manually choose the fields from Truck Mileage that I want to use to replace the fields that need to be updated (Figure 1).       Figure 1: Fields from Truck Mileage (Yellow) replace fields from Truck Metrics (Blue).  Note that fields in yellow are selected while fields in blue have been deselected.  Fields that need to be included from Truck Metrics (Red) remain selected.    Fantastic!  A simple, straightforward way to update fields!  But, as any analyst knows, working with data is rarely simple or straightforward.  What if you’re dealing with 20 fields that need to be updated, not just 2?  In that case, manually selecting fields to update is not only tedious but also error-prone.  For these types of situations, I recommend a process that allows for a more dynamic approach.       'Verticalize' the Data to Dynamically Update Fields Transposing, or ‘verticalizing’ data, allows for a more dynamic workflow when you have unknowns in your processing.  In a scenario such as this one, you may have an unknown or changing number of fields that will need be updated in Truck Metrics.  Using this approach, we’ll first Transpose both data sets to configure the Field Name and its associated value in a single row (Figure 2).    Figure 2: The data is transposed from Truck Mileage and Truck Metrics.  The highlighted fields in Truck Metics indicates which fields will be updated, as the same field exists in Truck Mileage.     Then, we’ll Join our datasets based on two fields: Truck Number AND Name.  This ensures that fields in Truck Mileage will match to the correct fields in Truck Metrics, assuming the fields names in both tables are named in the same way.  The only selecting we’ll have to do is to make sure all fields from the Right data source (in this case, Truck Metrics) are deselected (Figure 3).  This allows that, in the situation of matched Truck Numbers and field names, the updated values will be used.   Figure 3: The Joined fields indicate the fields that exist in both Truck Mileage and Truck Metrics.  Fields in yellow (Truck Mileage) are selected to reflect updated values in downstream tools.  Fields in blue (Truck Metrics) are deselected.    Note that any unmatched fields have fallen out of the Right side of the Join:     To add them back into the data stream, simply Union the Center and Right Joins together, setting the tool to “Auto Configure by Name”.  Then, to rearrange the data into its original format, use the Cross Tab tool (Figure 4).  And, voila!  Your data is updated!   Figure 4: The updated fields are highlighted in yellow.  All other fields originally included in Truck Metrics are included the in the final results as well.              
View full article
Recently a couple questions came across the Customer Support desk asking how a fiscal calendar could be incorporated into a workflow. Natively Alteryx doesn’t have a tool to create one, but Alteryx does have a number of tools to make a fiscal calendar. Here is an example of how this can be done.   Simple Calendar 1. Determine the start and end dates of the fiscal calendar and enter them into Text input tool, each on an individual row   2. Connect a TS Filler tool to generate dates between the start and end dates     3. A Select Tool was added to remove unnecessary fields   4. Add a Multi-Row Formula Tool to create a Day of Week field by assigning each row a day of the week from 1 to 7     5. Add another Multi-Row Formula Tool to calculate the fiscal week. Whenever the Day of Week is 1 add a value of 1 to the previous row’s fiscal week value.  This will create a running week value for the entire year     That's it!   An example workflow is attached.  Also in example workflow is an example of how the fiscal month and week of month may be added.  (Workflow is in Alteryx 10.6 version.)
View full article
Unlike a snowflake, it is actually possible for duplicates exist when it comes to data. To distinguish whether or not a record in your data is unique or a duplicate we have an awesome tool called the Unique Tool  that will actually turn your data into a unique snowflake.
View full article
There are a couple of different whitespace situations you might get yourself into, but this article has you covered in all of them!
View full article
Have you ever wanted to do a Cross Tab, but needed the results in a particular order?  You've probably discovered that Cross Tab will dutifully change your data to column headings and keep the rest of the data correlated, but that the output columns will be arranged in alphabetical order.  It's as if there were a Select tool with the "Sort on Field Name" Option thrown into your workflow without you putting it there.    
View full article
The Select Tool  within the Alteryx Designer is the equivalent of your High School Sweetheart. Always there when you needed them and helped you find out more about yourself.  The Select Tool can do exactly this by showing you the data type and structure of your data, but it also gives you the flexbility  to change aspects of your dataset.
View full article
  Be sure to first understand that different people/organizations all have different definitions on what they mean by rounding.  Once you are clear on what you need, you can use Alteryx to round your numbers.   Example 1: Divided by the place you want to round to, use CEIL to round up, then multiply back by the place:   324 to 400 CEIL([Number] / 100) * 100   Example 2: Ceiling and floor just go to the nearest integer.  The ROUND(x, multiple) function will allow you to specify the level of precision on which to round, very similar to Excel.  But you can't specify the direction, so in your example, if your did ROUND(3425.123,1000) it would return 3000.   Example 3:  Please see the attached v10.0 workflow which is more dynamic. The text input identifies the mapping for the conversion joined with length of source input - then an expression is applied in the Formula Tool.         T he final output.      More on rounding.
View full article
  Let's take a look at this data set that needs to be combined into one field for reporting purposes:     but it shouldn't end up looking like this:      That is no better for reporting purposes than the data we started with because we only want to see areas that need improvement and we don't want to just combine values separated by commas.    So how can we get it to look like this?     A formula such as this one could work:  If contains([ProdDessert], "Improve") then [ProdDessert] else "" endif + " & " + If contains([ProdSupplies], "Improve") then [ProdSupplies] else "" endif + " & " + If contains([ProdFood], "Improve") then [ProdFood]else "" endif + " & " + If contains([ProdFoodSvc], "Improve") then [ProdFoodSvc] else "" endif   But that's a lot of typing! And it will require cleaning up of "&" and spaces where they aren't needed.     So let's look at a dynamic way of doing this:    First, using a Transpose tool, the columns are turned into rows and the data is filtered for rows that contain the word "Improve":      Then, using a Summarize tool, the remaining rows are concatenated together using a space as the delimiter:     Lastly, some cleanup is done in a Formula tool:      And the data is ready for reporting!     Also, take a look at the attached workflow that walks through these steps. 
View full article
A flat file was parsed into a number of fields, including one field that needs to be converted to a numeric data type.    It looks like this in the data:  000000437 000012587   The resulting number is supposed to have four decimal points, so just applying ToNumber([Field]) won't work because it will convert  000000437 to 437 but the required output is 0.0437.     As always, there are a couple different ways of going about this:    1. Using the Substring (x, start, length) function, which returns the substring of [x] starting at [start], stopping after [length]. This way, the string can be broken into the first 5 characters and the last 4 characters and a decimal point inserted in between the two. Then, the ToNumber() function can be used as usual.    2. Using the  ToNumber (x, bIgnoreErrors, keepNulls) functions creatively. ToNumber will return 437 for the first row of data and  12587 for the second one . However, if the number is then divided by 10,000, it will return 0.0437 which is the correct result.   3. Using Regular Expression. Regex can recognize a pattern and replace it with another value. In this case, regex can find the last 4 characters of the string: (\d{4})$ and replace them with a decimal point + the pattern it matched to: \.$1           Also take a look at the attached workflow to illustrate the solutions. 
View full article
Let's say you have a dataset with products that aren't sold very often (not every month) but you would like to create data points for every month and fill in quantity and amount as zero for reporting purposes.    This is the perfect time to use the Generate Rows tool. It will allow you to take the earliest month on the dataset, increment that by one month (generating a new row each time) until it has reached the latest month or the current month you are in.     To do so, first use a Summarize Tool to identify both the min (earliest) and max (most recent) dates on your data.      In this example, we created a field called [CurrentMonth] to be used as the max date, but you could certainly use a date provided on your data set.    Then, set up the Generate Rows tool as follows:      We are creating a new field called [Date] that will hold all of the date values between the earliest date on the data set and the current month.    The Initialization Expression [Min_Date] is the starting value that will be used for the first row. In our case, it is the earliest date on the data set as identified by the summarize tool.   The Loop Expression is what is being used to generate new rows. In our case, it increments [Date] by one month until the Condition Expression is met. As long as the Condition Expression returns true, another row will be created. Once the Condition Expression returns false, no more rows will be created.   Once these rows are created, you will have to join them back to your dataset and fill in any missing columns appropriately. See the attached workflow for an example of how to do that.
View full article
  Sometimes, data will come with multiple header lines, for example, year and quarter might be broken into two rows, like this:      That is not very helpful when you are trying to analyze your data.    You could use a Select Tool and type in new names OR you could do a few transformations and use the Dynamic Rename Tool to merge the rows dynamically, so you are prepared when 2017 rolls around the corner and they start adding new columns to your file!    To fix data that looks like the above, we start with the Sample Tool to split the data into two data streams - one with just the values and one with the header information. Then, we transpose the data to let us fill in the missing years and use the Summarize Tool to concatenate the resulting field values:       Now we can use the header lookup we created as a right input into a Dynamic Rename Tool using the "Take Field Names from Right Input of Rows" option and selecting "Name" as the old field name and Concat Value as the new combined field name.    See the attached workflow for details of tool configurations. 
View full article
The Sample Tool allows you selectively pass patterns, block excerpts, or samples of your records (or groups of records) in your dataset: the first N, last N, skipping the first N, 1 of every N, random 1 in N chance for each record to pass, and first N%. Using these options can come in the clutch pretty often in data preparation – that’s why you’ll find it in our Favorites Category, and for good reason. While a great tool to sample your data sets, you can also use it for:
View full article
The Auto Field tool examines your data, and automatically optimizes the field type and length.  Take a look at your data with a Select tool, follow this up with an Auto Field, and follow that up with another Select tool to see what kind of changes you’ve made.  After you run the module, you can examine each Select tool to garner a before-and-after view of the adjustments made to the fields.  You can even take this a step further, and add a few Browse tools to see how your database actually decreases in size, you may be surprised by how much! In the below view, the file size was reduced by about 40% with the Auto Field tool, on just 50,000 records and one field. Now consider running a file of millions of records and the amount of the size decrease becomes really substantial! Before Auto Field View: After Auto Field View:
View full article
Any tool that performs a Sort will increase processing time.   Many tools in Alteryx utilize a sorting process as a facet of their inherent functionality or if a group by option is used.  The Summarize tool does this, for example.  Alteryx has the ability to remember Sorts, or said another way, it knows if data has already been sorted upstream.  This is a behavior that you can take advantage of as yet another means of optimizing your module’s processing.  In the example shown, the process in Green is more efficient, as sorting will only be performed on the data once.  For the process in Red, sorting will have to be performed twice.   Tools that Sort   Tools that use Sorts: Tools use Temp Files and/or Internal Sorts: Cross Tab Spatial Process Dynamic Input Block Until Done Join Auto Field Multi Row Formula Distance (Guzzler Only) Running Total Trade Area (Guzzler Only or Non-Overlapping) Sample Run Command (Temp files only) Tile   Unique Use Temp Files and/or Internal Sorts at the Process level: Allocate Report Location Optimizers Chart Batch Macros (for secondary inputs) Layout Iterative Macros (for secondary inputs) Render   Table   Behavior Create Profile   Buffer   Find Nearest   Generalize   Poly Build   Smooth   Spatial Match   Summarize   Charting   Join Multiple   Sort   Calgary Loader   Make Group   Fuzzy Match    
View full article
Alteryx provides GUI tools that offer similar functionality to many SQL commands. Although minimal SQL scripting may be necessary in order to properly configure tools, the amount required to complete analysis is significantly reduced.
View full article
 Scenario:  You've been given data for a new project and it contains lots of extra (and unnecessary) rows before you even get to the information you need to work with.  Look familiar?      For many Alteryx users, this situation is all too common.  Luckily, there's a pretty easy way to resolve this issue using the Sample and Dynamic Rename tools!   Method:  To demonstrate this approach, we'll use some sample data that has extraneous information and space at the top (Rows 1-4) of the spreadsheet in Figure 1 (below).  While the information itself might be important, it's going to interfere with our data analysis.  What we really want to see is the information in Row 5 as our header name and the information from Row 6 onwards to be our data.   Figure 1: The data in rows 1-4, as seen in Excel, should not be included in the data analysis.      Rather than manually re-format our dataset, we'll bring it into Alteryx and let the data preparation begin! Using an Input Tool, we'll navigate to the location of our data file.  The tool gives us a preview of what to expect when bringing in the data (Figure 2).  This format is nowhere near perfect, but we still have a few tricks up our sleeve!    Figure 2: The Input Tool shows how the data will be brought into Alteryx.  Our heading is not correct, and we still have a few lines of data (in light yellow) to eliminate while keeping the data we want to analyze (in dark yellow).     A quick visual assessment indicates that we'll need to skip the first three rows of data (the information in Row 4 will become our field names).  We can remove these data using a Sample Tool.  In the Sample Tool configuration (Figure 3), we'll opt to "Skip the 1st N Records"; in this case, N will be equal to 3.   Figure 3: Set the number of records to skip, or remove, from the top of the dataset.   Now that we've removed the first 3 rows of data, we are much closer to the version of the data format we'd like to work with.  The data we'd like to use as the field names (Number, FirstName and State) are now in the first row of data.  We'll use the Dynamic Rename Tool to re-name our fields using the option to "Take Fields from the First Row of Data" (Figure 4).  And, voila!! Our data is now ready to use for the next steps of our analyses.      Figure 4: After removing unwanted rows of data and re-naming the fields, our data is ready for further analyses.    *See the attached sample workflow (v10.5) for an example of this process.     
View full article
Question How can you calculate the standard deviation of the population (SDpop) rather than standard deviation on the sample data (SDsample)? Answer Please find the attached macro, special thanks to one of our Community users!   Standard deviation of the population (SDpop) and standard deviation on the sample (SDsample) are  very similar - the only difference is that in SDpop the sum of the squared variances is divided by n instead of n-1 as it is for SDsample (where n is the number of data points in the vector).  This macro takes a vector of doubles which needs to be named 'dataVals' and it returns the standard deviation of the population.  Accurate out to the 9th or 10th place.  
View full article
If you’ve gotten accustomed to having headers and sub-headers (fields/sub-fields) in your tables, much like those you can build with the “Merge & Center” Excel feature, you’ll be happy to hear with just a couple formatting steps we can build the format below:     In Alteryx, with similar formatting:     The attached workflow goes into the details of the approach, but the essence is below:   Split your columns/fields your desired categories (making them sub-headers) Create tables from your categories, add your supra-headers Combine the reporting elements we have now, use the Union Tool to stack and the Summarize Tool to combine the reporting elements horizontally Combine your sub-tables into a larger table, if needed (adding another header)   There you have it! Your tables are now rendered in style.
View full article
Errors from the R tool, and macros using R, such as the Association Analysis tool, can be a challenge to troubleshoot.  Since R is very particular about naming conventions, and the R macros are very particular about delimiters, one easy way to preemptively limit these errors is to make sure your field names are R-friendly and compliant.   Best practices is to use only letters, numbers and the underscore.  Any special characters will likely upset R. One hint is "Error: No valid fields were selected."  This means that one of the Pipe-to-Rows tools embedded in the R macros could not appropriately parse the field names.   For example, the following errors came from the Association Analysis tool, where the field names contain the gamut from asterisks to ampersands:   Error: Association Analysis (4): The Action "select fields A" has an error: ParseError: Expected entity name for reference at line 7 and column 30 while processing "Action_UpdateRawXml" Error: Association Analysis (4): The Action "select fields A" has an error: ParseError: Expected whitespace at line 13 and column 63 while processing "Action_UpdateRawXml" Error: Association Analysis (3): The Action "select fields A" has an error: ParseError: Unterminated entity reference, 'test' at line 4 and column 30 while processing "Action_UpdateRawXml"   If you have a thousand fields with every special character, don't fret.  Dynamic Rename and regex to the rescue!   Using the Rename Mode: Formula, select all your fields, and replace any non-word character with an underscore:     Crisis averted!  Happy Alteryx-ing.
View full article