We have extended our Early Bird Tickets for Inspire 2023! Discounted pricing goes until February 24th. Save your spot!

Engine Works

Under the hood of Alteryx: tips, tricks and how-tos.
DataNath
16 - Nebula

 

Background information

 

Let’s be honest–the Formula tool plays a pretty integral role within many workflows. We use it to create a variety of expressions, ranging from simple mathematics and record-flagging all the way through to building spatial objects like points, lines, and polygons. However, as is the case with most things in life, there is room for mistakes that can result in various errors within the tool. Fortunately, a lot of these are very common and incredibly easy to fix, often caused by missing a configuration step or using syntax that is slightly off. Below we’ll work through a handful of common errors and how they can be resolved:

 

1. The field “” is missing…

 

This is probably the most ‘doh!’ of all the common issues when using the Formula tool and one that I’ve certainly been caught out by more times than I’d care to admit! However, it’s easily done, especially if you’re writing more complex formulae where you’re more focused on that or if you’re trying to build quickly for something like a Grand Prix/Weekly Challenge. So, what’s the problem here? Well, you either a) haven’t selected a field to update or b) if you’re creating a new field - haven’t provided a name for it!

 

DataNath_0-1674147097909.png

 

As you can see in the screenshot above, the configuration for this expression still says ‘Select Column,’ which, when clicked on, will provide a drop down for you to ‘+ Add Column’ or update an existing one. Creating a new one will look like this:

 

DataNath_1-1674147097910.png

 

2. Data type errors

 

2.1. Invalid type in operator…

 

This error is a little more sneaky, as it is triggered by a mismatch in data types and these aren’t always obvious. It’s such an easy thing to overlook, especially if we’re evaluating data that seems like it should be X data type but is in fact Y, as the source can often pre-determine data types (i.e. CSV files storing all fields as strings). Here’s an example where someone may look at the incoming values and think, ‘Oh, the [Amount] field here is clearly numeric! Let’s put a flag in place.’ However, if we force this to be a string for the sake of illustration, you can see that when we come to build our expression, it falls over:

 

DataNath_2-1674147097837.png

 

DataNath_3-1674147097819.png

 

Whilst the [Amount] field here appears to be a number, because the underlying data type is a string, it’s actually just a bit of text and we therefore can’t directly compare this to a numeric value (the 200 threshold in this case). If the source of your data is doing something along these lines, then you can fix this in a couple of ways.

 

1. Place a Select tool ahead of your Formula tool and make the field numeric:

 

DataNath_4-1674147097886.png

 

2. Wrap the field in the ToNumber() function when referencing it within your Formula tool. This won’t change the data type of the underlying field but will convert that instance into a numeric value so that it can be evaluated as such:

 

DataNath_5-1674147097886.png

 

2.2. Forgetting to create fields with the correct data types!

 

Whilst this isn’t an error in the same sense as the other points here (i.e. Alteryx won’t spit and hiss a particular message/code at you), it is a mistake that people often make that ties in with what we have just discussed. Perhaps one of the most common scenarios where a field’s data type throws users off is when they build formulae that reference the results of a previous expression. Within the Formula configuration, the default ‘Data Type’ in the dropdown is V_WString, and therefore, if we forget to change this, it’ll throw things off as demonstrated below:

 

DataNath_6-1674147097877.png

 

You’ll also notice that instead of being colour-coded (as seen in the following image), the text of the second expression actually remains black, indicating an error in the evaluation.

This time using the correct data type (numeric):

 

DataNath_7-1674147097885.png

 

2.3. Comparing Date to DateTime

 

Though this implies the same issue as above, I thought this was deserving of its own section in the blog just to help highlight it. Users are often thrown off by this behaviour within the Filter tool when their expected results get excluded, and there are cases where this will lead to unexpected results in the Formula tool as well. For this scenario, we have a DateTime field - [Start Date]. However, if we try to evaluate it against a standard Date, you’ll see that in Alteryx we don’t get the result we would expect (neither of the records are after January 1st 2022 and so they should both return ‘On time’). This is because, whilst they are essentially the same for the sake of our check, we’re trying to put 2022-01-01 00:00:00 up against 2022-01-01 and, in a literal sense, they’re not the same:

 

DataNath_8-1674147097908.png

 

So how do we remedy this one? Well, we can use the two approaches used above in section 2.1.; change [Start Date] to Date, which will trim off the time element, wrap [Start Date] in the ToDate() function when using it in your expression, or, one extra option… you could add a specific time to your check to make the reference point a DateTime as well:

 

DataNath_9-1674147097885.png

 

3. Wrong number of parameters for the function…

 

Anyone building Formula expressions in Alteryx will have noticed that when you start typing, the software will provide prompts of what you may be looking for and that you can click and auto-populate. Within these prompts, we’re shown how the function ought to be structured and, if you hover over these pop-ups, a more detailed version containing an explanation will appear:

 

DataNath_10-1674147097877.png

 

The error here, therefore, comes from users not providing all of the arguments/parameters needed to enact the function. For example, in DateTimeDiff() shown above, we can see that we need to provide two dates (dt1,dt2), as well as a unit of time to show the difference in (u, relating to seconds, hours, days, months, etc. - but not weeks!). If we were to forget to provide the timeframe for example, we’d soon be met with the error message outlined above:

 

DataNath_11-1674147097831.png

 

Against a properly formed function:

 

DataNath_12-1674147097915.png

 

Whilst this error message is fairly self-explanatory and you can go to the prompts to figure out what you’re missing, I thought it was also worth noting that some functions do have optional parameters, which are often (but not always) denoted in square brackets. One example is the ToString() function. Here we only need to provide ‘x’, but we can define the other arguments as well if we wish:

 

DataNath_13-1674147097915.png

 

Above, we have used the numDec argument to give values 2 decimal places, as well as the [thousandsSeparator] parameter to add in a comma separator. However, the function works just fine without any of these:

 

DataNath_14-1674147097889.png

 

4. I’m getting 0 or -1 as a result instead of the value I have defined!

 

This next error is one I’ve seen less frequently than the others. However, it’s caused by a very subtle change that turns part of the expression into a boolean check, throwing off the output. As always, we’ll go through a quick example to demonstrate. Now, using the dataset from the start of this blog, let’s say we want to take 500 away from our [Amount] field if the [Name] is ‘DataNath’ or ‘Ben’. That would look something like this:

 

DataNath_15-1674147097889.png

 

However, what we sometimes notice is people typing out something like this instead:

 

DataNath_16-1674147097858.png

 

Whilst this looks like an innocent addition where you’re defining what that field should become, this is actually turning the second (THEN) part of the IF statement into a boolean check (i.e. is this true or false?). Therefore, in this example, the second (incorrect) expression is actually saying: IF their name is ‘DataNath’ or ‘Ben’, then is the [Amount] field equal to the [Amount] field minus 500? Which it obviously isn’t, and therefore Alteryx returns 0 (for Boolean results, False = 0 & True = -1). The reason the ‘Sarah’ and ‘Jessica’ records are unaffected is that they don’t satisfy the first part of the IF statement i.e. they’re not in the group of (‘DataNath’,’Ben’) and so this boolean check isn’t applied to them. The takeaway message here, ultimately, is to just use ‘THEN <outcome>’ rather than ‘THEN [Field] x <outcome>’ where x is an operator or function etc.

 

5. My AND/OR statement isn’t returning the correct values!

 

This problem is a little more inconspicuous and something that tends to catch people out while building more complex if statements with multiple checks, and the concept the fix relies on is one that people have often seen when learning/using maths. So, for example, let’s say we’re trying to make a ‘Review’ flag for Ben and Jessica within the North. We might be tempted to write something like this:

 

DataNath_17-1674147097941.png

 

However, as illustrated above, we get extra flags appearing where we aren’t expecting them. This is because we haven’t added parentheses to our ‘OR’ check. If you think of everything before the ‘THEN’ as its own True/False statement, what we’re actually saying here is: If the [Region] is ‘North’ AND the [Salesperson] = ‘Jessica’, then that is TRUE so apply the flag. However, unlike AND (which links checks, making them reliant on one another), OR in this sense, will reset the True/False test, and so the extra part of the statement where we evaluate [Salesperson] = ‘Ben’ is independent. This is why, regardless of the [Region], we see the flags appearing next to his name at all times (because, again, that check is TRUE for those records).

 

Although I only mention parentheses here, there are a couple of ways to address this.

 

1. Wrap our ‘OR’ check - in an English sentence this turns the above statement into: If the [Region] is North, AND the [Salesperson] is either Jessica OR Ben, apply the flag:

 

DataNath_18-1674147097927.png

 

2. Use the IN operator - this creates a list for the function to check against at the same time as the [Region] check:

 

DataNath_19-1674147097941.png

 

Hopefully this blog has helped to address why you’re getting the pesky error messages you may have encountered from time to time. If you have any others that you come across regularly, then feel free to comment below or search the Designer Discussions forum, where it’s likely that someone else has posted the same problem and been provided with a solution. 

 

The list covered here is only a small handful of common issues. There is, therefore, the scope for a follow-up if we notice an increase in more questions surrounding writing expressions! Below are a few extra links to helpful resources for using the Formula tool:

 

https://community.alteryx.com/t5/Alteryx-Designer-Knowledge-Base/Tool-Mastery-Formula/ta-p/32675

https://community.alteryx.com/t5/Alteryx-Designer-Knowledge-Base/Using-Conditional-Statements-to-Cha...

https://community.alteryx.com/t5/Interactive-Lessons/tkb-p/interactive-lessons/label-name/Writing%20...

https://help.alteryx.com/20223/designer/formula-tool

https://community.alteryx.com/t5/Alteryx-Designer-Knowledge-Base/tkb-p/knowledgebase/label-name/expr...

 

Special thanks to Nicole Johnson (@NicoleJ) and Jarrod Thuener (@JarrodT) for their help with this blog.

 

Nathan Purvis
Tableau/Alteryx Consultant

Ex-forensic scientist turned data analyst. Particular interests include business, finance, technology and blockchain.

Ex-forensic scientist turned data analyst. Particular interests include business, finance, technology and blockchain.

Comments