Data Science

Machine learning & data science for beginners and experts alike.
CailinS
Alteryx
Alteryx

Does this sound familiar? You just watched a fantastic demonstration for advanced users on regression modeling. You think (who wouldn’t?!) “These tools look amazing…imagine what I can do!” So you jump into Alteryx and start plugging in your data! “BUT WAIT! What are all these error messages?!” - this is the stuff Alteryx nightmares are made of…

 

  undefined

 

...but it doesn’t have to be!

 

Predictive analytics is a complex thing to tackle. There’s theory, business sense, data (why can’t they all agree?)! Errors can occur when any of those things are bad. And sometimes, just because you haven’t run the workflow! In this blog you will learn the simple steps to troubleshoot errors in the predictive tools. Once you have the basics down, you’ll get to dive into the world of R troubleshooting. I will present you with some of the error messages I’ve come across, and resolved, to give you a starting place and hopefully the courage to take on any errors you encounter. You can do this!!

 

If time is of the essence and you can’t read everything now, there is a summary of the steps at the end of the post as well as the most common causes of R error messages in the Alteryx predictive tools.

 

I’ve attached a workflow at the bottom of this post so feel free to open that and follow along.

 

The first thing you need to know about the predictive tools is that they are all macros. (If that means nothing to you, Help Menu – Sample Workflows - Tutorials – Build a Macro is a great place to start!) Because they are Macros, it means:

 

  1. They may behave differently than other tools.
  2. They contain multiple tools that you can’t see.
  3. They can be opened (right click, Open Macro) so you can see the tools and methods that make up the Macro.
  4. Some information is suppressed by default, and it could be important

 

Let’s Begin…by running the workflow

Speaking of #1, sometimes a tool will have an error message after it is configured until you run it once…so start there! As you may know, many of the predictive macros are based on R: The R Project for Statistical Computing, specifically any tool that has an ‘R’ in the tool icon! In cases like this, the underlying R code doesn’t know what data is coming until the first run. For some tools, this unknown metadata causes the tool to be in error until the data has been passed through the first time. In the same vein, it’s better to hold off on connecting and configuring downstream tools if the current tool is in an error state, because many Alteryx tools rely on known metadata for successful configuration!

 

Now for the fun stuff

In the example above there are 13 error messages and 4 warnings. Ouch. Where do we start? Well assuming you have had at least one error in Alteryx before now, you know the first step is to find the tool(s) with an error and select it (Tip & Trick: you can click on the tool name in your Results – Messages window to take you straight to the offending tool!) My advice is to pick the first one and start there (since the first error could be the cause of all the subsequent errors).

 

undefined

 

Error: Create Samples (27): Error: The estimation and validation samples exceed 100%.

 

This one is pretty simple – the estimation and validation samples need to sum to 100% and they are set to 133%...so take heart! Not all the errors are [seemingly] written in a foreign language. In fact, many errors will indicate exactly what the problem is and may even suggest how to fix .

 

At this point you can click on the ‘next’ error and continue fixing simple things, or you can rerun the workflow to see if some of the errors go away once the first one is resolved. Since the dataset in this example was literally built to break these tools (no pain, no gain, right?), we aren’t so lucky. In this case, the next error looks something like this undefined …so what now?

 

It’s time to look under the hood

Because R is running code behind the scenes (Don’t believe me? Right click and open one of the tools with an ‘R’, and pres Ctrl + F to search for the R tool and see code inside!), some of the error messages you will encounter come directly from R and may truly feel like a different language. In R, the error message may only make sense in the context of the surrounding messages…but there aren’t any other messages! Luckily, there ARE more messages and they are just hidden. When a macro is brought into an Alteryx workflow the default setting is to show only macro errors in the Results-Messages window. This is a great thing most of the time because there could be hundreds of tools in a macro and thousands of messages, and that information would likely confuse the macro user if it were always shown. But when there is an error, the extra information can be helpful. To get the information, you’ll need to turn on the setting to ‘Show All Macro Messages’. Click on the workflow white space to see the workflow properties. Find the setting on the Runtime tab.

 

undefined

 

Now re-run the workflow and find the tool with the first error. Often, the message right before or right after the first error indicates the problem. Find the tool that has the first error (Tool name (#) message) and focus on its messages (Tip & Trick: look at the messages for this tool only by selecting the tool, then clicking the Messages icon in the properties window. In this example there are 16 messages for this tool alone, and 200 for the workflow once macro messages are enabled).

 

undefined

 

Although these messages are generated by R (vs Alteryx), many of them are straightforward and easy to resolve. Here, the message right below the error references missing values being present, and missing values can cause a lot of problems in R. It is always the first thing to check for (with a Field Summary tool!) and my first suspicion when I see an error in one of the predictive tools.

 

In Alteryx, there are myriad ways to deal with missing data and fix the issue (e.g. Filter to remove rows, Formula to replace values, Select to exclude fields). The Data Investigation tools will help you find issues in the data and often indicate how to resolve the issue. For instance, is the variable important enough to fix? What value should I use to replace missing data? Those questions can only be answered by knowing your data and how important each piece is (or isn’t) to your target behavior. Re-run the workflow as usual to confirm issues are resolved. In this example, it is error free and ready for action!

 

undefined

 

Finally…hit the road aka the information superhighway

When the steps above don’t resolve all issues, it is time to hit the road! There will be error messages that aren’t immediately clear. Messages that tell of an issue beyond your understanding. When those messages and errors occur, know that R is a robust tool and language. It also has a very robust community! So my final piece of advice is to head to the internet and use the wealth of resources there! To get the most out of your search, only copy/paste the R error message portion (leaving out the Alteryx specific pieces). In the messages above (had we not fixed them so easily) ‘Error in na.fail.default(list(Food_Away = c(1993.33, 238.33, 694, 441.67,  :missing values in object’ the bold text represents the the R specific error. I find that pasting the error and adding ‘in R’ usually results in a helpful article or two!

 

In conclusion…you can do it!

You really can do it! These tools can open up a whole new world if you will take the time to prep, investigate, and troubleshoot when things aren't perfect the first time. Below I have summarized the whole post to make it easier to brush up when you need to apply these concepts! Good luck!

 

Steps:

  1. Run the workflow
  2. Re-Run the workflow
  3. Click on the first error message in the Results – Messages
  4. Read it Smiley Happy Fix the issue if it’s straight forward
  5. Turn on Macro Messages and re-run the workflow
  6. Find the tool with the first error message and open its messages
  7. *World of R troubleshooting from here on out* Read the messages/warnings right above and below the first error and fix the issue if it’s straightforward
  8. If it’s still unclear copy/paste the R error into a search engine and utilize the R community to guide you (stackoverflow is a great option!)

Common Causes:

  1. Missing data
  2. Bad data (e.g. long string values and/or special characters in the field names OR data)
  3. Bad theory (e.g. when there are more predictor variables than there are rows of data)
  4. Bad connection (e.g. you plugged the Report output into a tool that wants the model Object output)

Thanks for reading! Try out these tips and let us know how it goes. Please feel free to add tips of your own, or error messages that you’ve encountered and resolved when using the Predictive Tools. For ones not yet resolved, search the Forum, or create a new post!

 

Comments
CailinS
Alteryx
Alteryx

I also see questions come up when your fields don't show up in the tool configuration. This almost always happens because the tools are filtering out fields of certain types (for instance, the Association Analysis tool only shows you numeric fields in the bottom selection). This happens a lot when data comes in from a .csv as text, but it needs to be converted (with a Select tool perhaps) into a number in order for certain predictive tools to reconize it/let you use it.

DanC
Alteryx Alumni (Retired)

Hello,

 

I just wanted to share a recent experience I had troubleshooting a predictive workflow. The workflow involved AB Trend data and test data feeding into the inputs of the AB Controls tool. After cleaning up the measures data in order to get the AB Trend tool working correctly, and verifying that all of my test data ID records exist in the measures data, I thought that my issues were over. However, when running the workflow, the AB Controls tool would give the following error:

 

undefined

 

I had a difficult time trying to diagnose this as I thought my data was good in both streams. After all, the AB Trend tool was working fine, so I just trusted its output which was feeding the AB Controls tool. The test data was easy to check, so I knew it couldn't be that.

 

After a little more prodding and some guidance, I discovered that the results coming out of the AB Trend tool had contained some nulls and another weird result (see highlight below).

 

undefined

 

This indicates an infinite value (INF) in my measures data coming out of the AB Trend tool, likely due to a divide by zero issue in calculating the trend. Once I removed this UPC and the others that resulted in null, the rest of the workflow ran successfully.

 

This experience fits right up there nicely with the Predictive Troubleshooting theme: Check your data. All of it! Smiley Happy

 

DanC
Alteryx Alumni (Retired)

Regarding the above workflow troubleshooting, below are the related error messages from the error log with macro messages turned on in the Runtime tab of the workflow configuration. Those that appear in the message window in the designer are highlighted in bold and red.

 

00:01:42.149 - ToolId 4: AB Controls: R version 3.1.3 (2015-03-09) - x86_64
00:01:43.059 - ToolId 4: AB Controls: rgeos version: 0.3-12, (SVN revision 498)
00:01:43.061 - ToolId 4: AB Controls: GEOS runtime version: 3.4.2-CAPI-1.8.2 r3921
00:01:43.063 - ToolId 4: AB Controls: Linking to sp version: 1.2-0
00:01:43.066 - ToolId 4: AB Controls: Polygon checking: TRUE
00:01:43.888 - ToolId 4: AB Controls: Loading required package: FNN
00:01:43.961 - Error - ToolId 4: AB Controls: Error in get.knnx(as.matrix(the.data[, names(the.data)[sapply(the.data, :
00:01:43.964 - ToolId 4: AB Controls: Data include NAs
00:01:43.967 - ToolId 4: AB Controls: Calls: getKnn -> get.knnx
00:01:43.969 - Error - ToolId 4: AB Controls: Execution halted
00:01:44.337 - ToolId 4: Tool #141: R version 3.1.3 (2015-03-09) - x86_64
00:01:45.251 - ToolId 4: Tool #141: rgeos version: 0.3-12, (SVN revision 498)
00:01:45.253 - ToolId 4: Tool #141: GEOS runtime version: 3.4.2-CAPI-1.8.2 r3921
00:01:45.256 - ToolId 4: Tool #141: Linking to sp version: 1.2-0
00:01:45.258 - ToolId 4: Tool #141: Polygon checking: TRUE
00:01:45.451 - Error - ToolId 4: Tool #141: Error in if (uniq.cntrls) { : missing value where TRUE/FALSE needed
00:01:45.457 - Error - ToolId 4: Tool #141: Execution halted
00:01:45.464 - Warning - ToolId 4: Tool #228: The field "Right_X_._Seasonality" is not contained in the record.
00:01:45.480 - ToolId 4: Tool #227: 0 records were joined with 0 un-joined left records and 29638 un-joined right records
00:01:45.482 - Error - ToolId 4: Tool #141: The R.exe exit code (1) indicated an error.
00:01:45.484 - Error - ToolId 4: AB Controls: The R.exe exit code (1) indicated an error.

kempo1981
7 - Meteor

Hi, I have a different issue where Alteryx returns the error "rcorr(the.data, type = cor.type) : must have >4 observations" however i have 9 values that are to be included.....

 

Any advice would be appreciated!

 

Clipboard02 - Copy.jpgClipboard01 - Copy.jpg

CailinS
Alteryx
Alteryx

Hi. Without seeing more of the data I can't be sure what is causing the issue. But 4 observations means 4 rows or more (different from columns, or fields). That said, 4 rows are the absolute minimum. I would not feel especially confident drawing conclusions from correlation statistics calculated off of only 4 observations. Also I see that your data has some null values and as you know, many R packages do not like nulls! So you'll want to replace them with a different value or filter out rows that have missing data (which would of course mean you have fewer observations). Hope this helps!

kempo1981
7 - Meteor

Thanks Cailin for the quick response. There are 9 rows showing with values for target variable (in this case Business_Services_and_Solutions) and whilst there are some nulls normally it will still run the analysis.

 

Appreciate the help.

 

Mark

Kandemiro
7 - Meteor

I've been getting this error several times as well. Eventhough both the input data seems to look the same regarding format, I receive the error: No valid fields were selected. (Using the KNN algorithm). Any suggestions?KNN.PNG

Kandemiro
7 - Meteor

Never mind, after changing some formats, I needed to run the workflow again apperantly. The error dissapeared.