This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Question I'm getting an error that says "Type mismatch in operator +" in my formula, what does it mean?
Answer This error often means that you're trying to add a number to a string field. If you are trying to concatenate your fields, you need to be sure that all fields being combined are string values. When Alteryx sees a numeric value and a + operator it wants to add the values together as opposed to concatenate.
There are (at least) two ways to resolve this error:
1) Use a Select Tool to change the field type before feeding the data into your Formula Tool
2) Use the ToString() function to convert the field to a String value within the expression itself
Both solutions are demonstrated within the attached v10.5 sample.
Your data analyses may call for selecting records from within a group or category. To isolate the data you want, try using the “grouping” functions embedded in some of the tools in Alteryx Designer, namely the Summarize, Sample and Sort Tools.
In this scenario, let’s you say you have data from an online home goods store. Your data is grouped by category (Bedding, Bath, Kitchen, Dining and Furniture). Each individual product's record contains a SKU, the unit price and the quantity sold over a given time period. For your analysis, you have been tasked with 1) selecting the first product from each category and 2) the most/least expensive items and those that have the largest/least quantity sold. Follow along using the attached workflow as a reference.
Option 1: Select by First/Last Position
Summarize: To select the first (or last) record from a category of data with the Summarize Tool, group the data into categories with the “Group By” process on the field [Type]. Then, select the first (or last) record in the data stream associated with that group using the First or Last process.
Sample: The Sample Tool’s configuration includes an option to sample data by Group. Sample the last (or first) record from each category while also grouping by [Type].
Option 2: Select by Maximum/Minimum Value
Summarize: To identify the maximum or minimum values of a numeric field, “Group By” the category [Type]. Then, select the Minimum or Maximum value. For example, to find the most expensive item in each category of the data, group the data by [Type] and select the Maximum [Price].
Sort and Sample: Alternatively, sorting the data before sampling records will order the records in ascending or descending order. By utilizing the “Group By” functionality of the Sample tool, the greatest (or least) number of items sold per category can be identified. Note, the Sample tool is not limited to selecting the one maximum or minimum values per category [Type]. By changing the value of n, in this case to 5, the five most popular items sold per [Type] are extracted from the data.
*Attached workflow has been modified for compatibility with Alteryx version 10.0 and above
A large component of data blending is applying mathematical or transformational processes to subsets of your data. Often, this requires isolating the data that complies with a certain criteria that you’ve set. The Conditional functions build expressions that allow you to apply processes to data that satisfy conditions you set.
Data blending, transformation and cleansing..oh my! Whether you're looking to apply a mathematical formula to your numeric data, perform string operations on your text fields (like removing unwanted characters), or aggregate your spatial data (among many other things!), the Formula Tool is the place to start. With the examples provided below, you should be on your way to harnessing the many functions of the Formula Tool:
When writing expressions that evaluate NULL conditions, it is a common mistake to write the syntax as:
IF [FieldX] = NULL() THEN ...
After all, this is how you would write the expression in SQL. However, the actual syntax needs to be:
IF ISNULL([FieldX]) THEN ....
Keep writing those expressions!
You've gotten your long dataset and you want to combine it with another dataset for additional information. Your dataset is nice and clean. Everything is formatted the same, no null values... The whole package. You open up the data to join to and right away you see a ton of clean up that needs to happen: nulls to replace, strings to format appropriately, extra characters, white space, the list goes on. You launch the Designer, and while fast and accurate, you have to set up a new Multi-Field Formula Tool for each situation you need to fix. If only there was a single tool that did it all.
The Generate Rows Tool, which is part of the Preparation tool category, creates new rows of data based on a user defined loop expression. It is especially useful when creating sequences of numbers or dates.
For example, l et's say you have a dataset with products that aren't sold very often (not every month) but you would like to create records for every month and fill in quantity and amount as zero for reporting purposes. You can use the Generate Rows Tool to take the earliest month on the dataset, increment that by one month (generating a new row each time) until it has reached the latest month or the month you are in.
Question How do I calculate the number of business days between two dates?
Answer We get this question a lot in Customer Support. It's actually fairly easy to do using a just a few tools.
Assuming your beginning and end date are in two separate fields you can take a look at the sample attached to demonstrate the process.
1) Use the Generate Rows tool to fill in the missing dates
2) Use a Formula tool to create a field with the day of the week for each of the filler dates
3) Use a Filter tool to remove the unwanted days of the week
4) Use the Summarize tool to count the number of days that are left
Note that if you don't already have an unique ID on each record, you'll want to use the Record ID tool to add one to make the summarize process easier.
The sample workflow was built in 10.6, but the same general process can be followed in all versions.
Does "Dictionary Sort Order" always place lower case letters before capital letters?
Yes. In the Sort-Configuration menu there is an option to "Use Dictionary Order". When checked it will sort in alphabetical order with lower case first (e.g., a, A, b, B, c, C, etc.).
If you do not have "Use Dictionary Order" checked, it will sort all Upper case first and then all lower case (e.g., A, B, C, a, b, c, etc.).
Check "Use Dictionary Sort Order.
Dictionary Sort Order
Visit the sort help article or the attached workflow for more details.
The Neural Network Tool in Alteryx implements functions from the nnet package in R to generate a type of neural networks called multilayer perceptrons. By definition, neural network models generated by this tool are feed-forward (meaning data only flows in one direction through the network) and include a single Hidden Layer. In this Tool Mastery, we will review the configuration of the tool, as well as what is included in the Object and Report outputs.
Alteryx’s spatial functions are not limited to the tools found in the Spatial Palette. The Formula Tool, for example, has a variety of spatial functions that expand the Designer’s spatial capabilities beyond those in the Spatial Tool Palette. This article highlights the Formula Tool’s ability to create spatial objects, particularly points and lines, and calculate distance using the built-in expression library.
Topics discussed below:
Currency Field Formatting
Currency Field Formatting - Strings to Doubles
We often get a lot of questions from new users about how to convert fields with currency formats to doubles and vice versa. If you have currency fields in your data that come in to Alteryx with the currency format (ex: $1,354.00) and you want to perform any kind of numeric calculations, you must convert these fields to numeric fields (i.e.: Double, FixedDecimal, Int 32 etc…). For more information on data types click here.
There are a couple ways to convert this string format to a numeric format. Below I will demonstrate two ways: one is using the formula tool, the other is using the multi-field formula tool. Both are similar but one is a bit more flexible than the other.
Using the Formula Tool:
If you’re using the formula tool you must make a new field for the converted string. Make sure the type is numeric, I usually go with Double. Then use the formula shown below. The first part of the expression will replace the $ and comma with empty strings. While ToNumber() will change it to a number.
Using Multi-Field Formula:
Using the multi-field formula is very similar to the formula tool. In this tool we can specify multiple fields if we so choose. Also we don’t have to make a new field. We can actually change the ones we already have. In the configuration, make sure you have “Text” fields pop up in the selection window and select the fields you want to change. The formula in this case is exactly the same except you use [_CurrentField_] as your field variable. This variable will run through the expression for all fields that you selected at the top.
Currency Field Formatting - Doubles to Strings
After you have done your calculations and you want this back into currency format you can simply use this expression in your formula tool:
'$'+ToString([FieldName], 2, 1)
More info about this formula here.
Now on to a fun topic: Currency Conversion
For those who gather data that comes in as a different currency than what they want in their reports could always look at conversion rates manually and do the math themselves (using Alteryx of course). The only problem with that is it becomes tedious and currency conversion rates change all the time. The best way to get this real time data is to do an API call to a website which offers this data in real time.
The workflow I have attached has a macro that I have built (version 10.6), which allows a user to choose the currency their field is in with a dropdown interface and convert it to a different currency. This macro uses the xe.com free API to get currency conversion rates in real time.
The base URL we make for this request is http://www.xe.com/currencyconverter/convert/?Amount=1&From=”FROM”&To=”TO”
The “FROM” and “TO” will change when the user chooses the currencies. After that happens, these will get replaced with the currency’s ISO 4217 code and the download tool will gather all information pertaining to that URL. After some parsing, we obtain the currency conversion rate and place it into its own field, from which we use to calculate our new currency.
To learn about APIs and how to connect to them using Alteryx. I would check out this article.
Objective: A dataset has 31 fields, seven of which represent categories of data and 24 that represent each hour of a day (Figure 1). The objective is to multiply the values in each of the Category fields by the values in each of the hour fields, effectively creating 24 fields for each of the 7 categories. The data output should contain 199 fields*: the original seven categories of data, the 24 original hourly data and the 168 fields generated in the workflow. The 168 fields should be named using both the Category Name and the Hour: Category_1_1, Category_1_2….Category_7_24.
Method: Conceptually, the workflow will take each of the seven categories of data and multiply each categorical field by the values stored in each of the 24 hourly fields. The repetitive nature of this process lends itself nicely to a batch macro configuration to automate the calculations. One major obstacle, however, throws a bit of a wrench in our plans. Batch macro outputs “Union” the data, stacking data vertically. That means that the data associated with Categories 1-7 would be included in the output data as a new record, not as a new field. Because the goal is to see these data “Joined”, or appended laterally, we essentially need to defy the laws of gravity (or, batch macros, as it were) by leveraging the powers of Crosstab and Transpose tools. To successfully configure our data laterally, some key steps need to happen at three steps in the workflow: (I)before the data enters the batch macro, (II) inside the macro, and (III) after leaving the macro. Follow along with the attached workflow to visualize the process at each step.**
I. Before the data enters the Macro….
The very first step after bringing in data is to add a Record ID to assign each of your records a unique identifier. Not only do we recommend this step when the input data will be undergoing some significant transformations but, ultimately, this Record ID is the glue that holds this whole process together. Without it, it’d be a pretty tough, if not impossible, task to get this data pieced back together.
The next step is to split the data into two streams, one for the categorical data fields and another for the hourly data fields, using Select tools. Then, the data is Transposed, flipping the data vertically, using the Record ID as the Key Field and the Category or Hour fields as the data fields. This step prepares the data for the mathematical processes that will be applied to the data in the batch macro. The transposition of the Category data fields also creates an opportunity to use a Summarize tool to list the values (each of the Categories) that will be fed to the macro’s control parameter.
II. Inside the Batch Macro...
Once the data is brought into the batch macro, the data is Joined together (by Record ID) to assemble the data so that the values stored in the Category fields are alongside its corresponding data from the Hour fields. These values are then multiplied together in the Formula tool. Additionally, a new field, called “Placeholder”, is created that contains the name of the Category that is used for that batch of the macro. Note that this field is connected to the Control Parameter, which indicates that the value in that field will be updated with each batch to reflect the changing Category field names. You may notice that, at first glance, the “Placeholder” field seems unnecessary as it simply duplicates the Category field. However, this field’s importance isn’t fully realized until we start to re-compile the data using Crosstab tools a bit further downstream.
At this point, the data is split again into the data associated with Categories and Hourly data to begin the process of horizontally arranging the data and updating the naming schema of the fields. Using Crosstab Tools, the Category Data is grouped by Record ID and Placeholder, which ensures that this newly created field carries through the macro. The Hourly data fields are split into two separate streams, one for the newly created data in the field “Multiplied” and the original data. Both data streams are updated using a Dynamic Rename tool to add prefixes to field names. The new Hourly data fields, however, are connected to the Control Parameter. This means that, like the value of the “Placeholder” field, the prefix of the field will be updated with every batch of the macro. Before the data leaves the batch macro, it is joined together with a Join Multiple tool. This is the first instance in which you can see that Record ID come in handy with re-assembling our data!
III. After the Batch Macro...
This third and final step of the workflow was developed to overcome the obstacle of the Unioned output of the batch macro process. Clicking on the “Browse” following the Batch Macro will show that all the records for each of the batches are now stacked vertically and our field names only reflect the first batch of data. All the data we want is technically there; we want to visualize it horizontally for a side-by-side comparison. To re-configure the data so that each batch of data is appended laterally, we can use another round of Transpose and Crosstab tools.
First, we’ll transpose the data so that, once again, is arranged vertically. You might be thinking that this is a step backwards from our end goal. Well, in this case, it’s one step back, then two steps forward! Not only do we get those field names that we need to fix we’re arranging our data vertically, we’re preparing to create our new 168 field names according to our naming schema. The fields we’re primarily concerned about re-naming are those with data that was calculated in the batch macro, which can be filtered out of the data. Then, using RegEx and a Formula tool, we begin the process of creating our 168 new field names. Once again, that seemingly unnecessary “Placeholder” field comes in handy! Since all of our field names contained “Category 1” in them, we can compare our RegEx output (what we don’t want) with the Category that is contained in the “Placeholder” field (what we do want) with a Formula tool to re-construct every possible combination of Category Value and Hour that we need. A final Crosstab aligns the data horizontally so that the data can be Joined and visualized in the way we intended. After all that data manipulation, how do we know we got it all back together correctly? Luckily for us, that Record ID we created at the beginning of our workflow ensures that both our Crosstab and Join tools successfully connect everything correctly.
*Normally, we would not recomemnd creating 199 fields but sometimes visualization or side-by-side comparison is the goal. Inspired by a true story.
**Workflow created in Alteryx Designer v10.5
You want to impress your managers, so you decide to try some predictions on your data – forecasting, scoring potential marketing campaigns, finding new customers… That's great! Welcome to the addictive world of predictive analytics. We have the perfect platform for you to start exploring your data.
I know you want to dive right in and start testing models. It's tempting to just pull some data and start trying out tools, but the first and fundamentally most important part of all statistical analysis is the data investigation.
Your models won't mean much unless you understand your data. Here's where the Data Investigation Tools come in! You can get a statistical breakdown of each of your variables, both string and numeric, check for outliers (categorical and continuous), test correlations to slim down your predictors, and visualize the frequency and dispersion within each of your variables.
Part 1 of this article will give you an overview of the Field Summary Tool (never leave home without it!) Part 2 will touch on the Contingency and Frequency Tables, and Distribution Analysis; Part 3 will be the Association Analysis Tool, and the Pearson and Spearman Correlations; and Part 4 will be all the cool plotting tools.
Always, every day, literally every time you acquire a new data set, you will start with the Field Summary Tool. I cannot emphasize this enough, and I promise it will save you headaches.
There are three outputs to this tool: a data table containing your fields and their descriptive statistics, a static report, and the interactive visualization dashboard that provides a visual profile of your variables. From this output, you can select subsets to view, sort each of the panels, view and zoom in on specific values, and it even includes a visual indicator of data quality.
You'll get a nifty report with plots and descriptive statistics for each of your variables. Likely the most important part of this report is '% Missing' – ideally, you want 0.0% missing. If you are missing values, don't fret. You can remove these records or impute those values (another reason knowing your data is so important).
Also check 'Unique Values' – if you have a single unique value in one of your variables, that won't add anything useful to your model, so consider deselecting that variable.
The Remarks field is also very useful – it will suggest field-type changes for fields with a small number of unique values, perhaps that should be a string field. Or, if some values of your field have a small number of value counts, you may consider combining some value levels together.
The better YOU know your data, the more efficient and accurate your models will be. Only you know your data, your use case, and how your results are going to be applied. But we're here to help you get as familiar as you can with whatever data you have.
Stay tuned for subsequent articles – these tools will be your new best friends. Happy Alteryx-ing!
Here in Customer Support, we often get asked from new users how they can add total sums for every column onto the bottom of their datasets. There are several ways to do this.
An easy suggestion is using the Summarize and Union tools to sum the rows and add those sums back into the bottom of that dataset. Sounds simple right?
An even easier suggestion is where you show clients the lovely “Add Totals” macro that is available from the CReW Macro pack, which works for any sized dataset.
However, there is one limitation to this macro and that is its ability to add a word like “Total” or ”Grand Total” to the string field right before the sums of your rows. This is obviously useful for those making reports and tables in Excel.
Attached to this article is a workflow I created (version 10.1) that shows the functionality of this macro as well as how to add the word “Totals” before the summations. Also you can see how I created row totals with and without using the CReW macro. Below I’ll explain how I go about adding this row if you are using the CReW macro. I suggest you go ahead and download the attached workflow to explore how to do this without using the macro.
First, place a RecordID tool before you use the “Add Totals” macro. Add a sample tool in one stream out of the macro to get the last record.
Use a Multi-Field Formula tool to replace all Null values in string fields with the word “Total”. In a separate stream, add a filter tool and filter out where all instances of RecordID are Null since Add Totals macro won’t sum this stream up if you unchecked it in the macro configuration.
Lastly remove the RecordID’s using a select tool and Union the data back together using the default option: “Auto Config by Name”.
This should work for datasets of all sizes, therefore this is a useful trick to use when creating a macros that will be used for outputting report tables.
Time series forecasting is using a model to predict future values based on previously observed values. In a time series forecast, the prediction is based on history and we are assuming the future will resemble the past. We project current trends using existing data.