Since one of the 10.0.x upgrades I've noticed some unexpected behaviour when using the download tool to POST to a web service with authentication. I can see using Fiddler that Alteryx is carrying out two calls, one without security credentials present which fails, followed by another with the Authorization string which succeeds. This can result in conflicting information where download data is returned successfully but the download headers include the 401 Unauthorized return from the initial call. Screenshot of this below, DownloadData returned with DownloadHeaders from the first call.
This isn't causing any problem in my use case but it's certainly unexpected behaviour to me.
That's interesting. I haven't seen the same thing but I'm wondering if as part of the workflow/tool initialization it makes the inital call. I think the impact would be if your workflow has built in checks of the response field to ensure valid calls.
Building such a mechanism was promoted in the 'Web Scraping with Download Tool' webinar that was presented on Tuesday of this week. The instructor build a macro that looked only for a 200 (valid) response and filtered the other stuff out. If you have followed that method, you'd be missing the valid data.
Hopefully and Alteryx engineer could chime in on this one. Have you tried it in 10.1?
I should add, the mismatch between download data and headers is not consistent, on some occasions the header for the second call including authentication appears. The sequence of posts is totally consistent however, there are always two, one with and one without.
the inputD.txt file contains the content of the POST message and must be in the same folder where you’re running the curl.exe command. This command will ask you to input the account password. Once you do that, you will get two header files: -v and –u , one has the 401 message and the other the 200 message.
Since this is the expected behaviour for both the standard tool (curl) and the NTLM protocol, we cannot change the output. I attached a suggested modification of the workflow you sent to process the 401/200 header cases to address the decision path issue.
Explanation of the workflow:
Started by adding a RecordID tool to be able to join back the results to the original data stream. This is also useful if you have multiple responses.
With the select tool I keep only the RecordID and the Header field
Using the Text to Columns tool, I split the Header results into separate rows with \n as delimiter
Using the filter tool I filter for Rows that containing HTTP/1.1 to get rid of the rest of the header results.
Using the formula tool, I check if it’s a 200 or a 401 message and set a New Field of type Bool as 1 or 0 accordingly. In the case where there are both 401 and 200 messages, we will have both a 1 and a 0 for the same record ID. We will address this using the Summarize tool.
In the Summarize tool, I GroupBy RecordID and select Max of Successful which will give me a 1 if there is both a 200 and 401 message. It will also give a 1 if there is only a 200 message and a 0 if there is only any message other than 200.
Finally, I join the result with the original data using the RecordID so the result is:
You can use the field Successful for further conditional checks
You can modify the If statement in the formula tool to account for other message types if relevant.