This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
In Alteryx Designer 2019.1, after adding Browse tools to all of the output anchors of a tool using Right-click > Add All Browses OR Ctrl + Shift + B, running the workflow returns an Unhandled Exception error. The workflow runs indefinitely until Designer is canceled from Windows Task Manager.
This article is part of the Tool Mastery Series, a compilation of Knowledge Base contributions to introduce diverse working examples for Designer Tools. Here we’ll delve into uses of the Pearson Correlation Tool on our way to mastering the Alteryx Designer.
This article is part of the CS Macro Development Series. The goal of this series is to communicate tips, tricks, and the thought process that goes into developing good, dynamic macros. In this part, we demonstrate how to read in multiple files with different schemas using a Batch Macro.
How to dynamically run the most recent file in a file folder
Sometimes you may have daily, weekly, monthly or yearly data dumps where you want to only run the most recent file. Within Alteryx you can make this process dynamic and seamless through the use of a few tools.
Step 1: Directory Tool
The Directory Tool will allow you to browse to a folder and return all the metadata related to the files which exist within that folder. The field of interest in the metadata is the 'Creation Time'.
Step 2: Sort Tool
Using the field called ‘Creation Time’ we can use the Sort Tool to sort the date and time values into descending order to get the most recent file at the top of the dataset.
Step 3: Sample Tool
After sorting the 'Creation Time' field I now have my most recent file in record 1. Yet, I still have rows of data for the other files within that folder that I need to remove. I can now use the Sample Tool to take the ‘First 1 Record’ and this will result in the latest file information being left.
Step 4: Dynamic Input
Currently the field I have in my dataset only shows metadata avaliable for that file such as Full path, Creation Time etc. I now need to read this file and pull in the data by using the Dynamic Input Tool. In the ‘edit’ section select a placeholder file.
Then in the Read a list of data sources ‘Field’ dropdown this will need to be the ‘Full Path’ field coming from the directory Tool. In the Action dropdown this will need to be set to ‘Change Entire File Path’.
Step 5: Run the workflow
You can now run the workflow and it will dynamically always pick the latest file from that folder and read the data into Alteryx.
Please find an example module attached to this article (Built in Alteryx Designer 10.5)
Can I read in an Excel file located in a zipped archive file from Amazon S3?
Unfortunately, this is not an option within the Amazon S3 Download Tool, as it only allows you to choose between CSV, DBF and YXDB files. However, this is possible within Alteryx with the use of a simple workflow utilizing a three line batch file, the Run Command Tool (master it here), and the AWS Command Line Interface (CLI).
In order to use the CLI, you must first download it and configure its settings. Please visit this page for information on how to do that. Once that is setup, you simply need to setup the batch file and configure the Run Command Tool.
In the first step, you will use a Text Input Tool to write the batch file code. This code will use the CLI to copy the ZIP file from the S3 bucket to a locally accessible drive. Configure the Text Input Tool as follows:
Make sure that line 2 points to where your CLI is installed.
In line 3, replace "alteryxtest" with the name of your bucket, "ExcelTest.zip" with the name of your ZIP file and enter in the correct location to copy the file to.
In the second step, you will use the Run Command Tool to do the following:
Write out the batch file ("Write Source")
Run the batch file created in the previous step ("Run External Program")
Read the file into the workflow ("Read Results")
When entering in the "Read Results" section, your ZIP file will not exist at this point so you cannot simply navigate to and select the file. So, you have two options:
Click on the "Input" button and enter in the full path of where you are copying the ZIP file (found on line 3 of the Text Input tool) along with the file name, a pipe character, and then in brackets, the sheet name. For Example:
Run the workflow once without the "Read Results" section completed in order to copy the ZIP file from the S3 bucket. Then, click on the "Read Results" button and navigate to the ZIP file and choose the Archive file to read it.
This same workflow can be used to read other archived files as well. However, you will have to make slight adjustments to the "Read Results" section of the Run Command tool. For example, if reading in a CSV file, you would simply include the archived file name. Since a CSV file does not have "sheets", the bracketed sheet name is not needed.
I plan to create a simple macro with a user interface that will do the same thing. Once complete, I will post it in the reply section.
Thanks for reading!
The Sample Tool allows you selectively pass patterns, block excerpts, or samples of your records (or groups of records) in your dataset: the first N, last N, skipping the first N, 1 of every N, random 1 in N chance for each record to pass, and first N%. Using these options can come in the clutch pretty often in data preparation – that’s why you’ll find it in our Favorites Category, and for good reason. While a great tool to sample your data sets, you can also use it for:
The Tool Mastery Series is a compilation of Knowledge Base contributions that introduce diverse working examples for Designer Tools. We've organized the links below to help you on your journey to mastering the Alteryx Designer! In/Out
Date Time Now
Multi Field Formula
Multi Row Formula
Text To Columns
Apps and Macros
Numeric Up Down
Basic Data Profile
Test of Means
K Centroids Cluster Analysis
K Centroids Diagnostics
Amazon S3 Upload and Download
Block Until Done
Consider yourself a Tool Master already? Let us know at firstname.lastname@example.org if you'd like your creative tool uses to be featured in the Tool Mastery Series.
Stay tuned with our latest posts every #ToolTuesday by following @alteryx on Twitter! If you want to master all the Designer tools, consider subscribing for email notifications.
Welcome to the closing chapter of our voyage through the Pre-Predictive series! This has been a four-part journey introducing you to the thrilling world of data investigation. This section covers the plotting tools included in the Data Investigation Toolbox.
Welcome to Alteryx 11.0 new feature Data Profiling in the Browse Tool. We are excited about this and we hope you will be too! The Data Profiling in the Browse tool was created to assist the user in better understanding the quality of their data at any point within the workflow. This option will assist users in understanding the quality of their data, assist in troubleshooting and help fix issues that may arise when attempting to parse, join, or output their data.
Sometimes multiple conclusions can be drawn from the same data. Ok, often multiple conclusions can be drawn from the same data. This is especially the case with the Connection Progress that pops up between tools. You may be a bit familiar with this already. When you run a module, you may see something similar to the following: 114gb of data is being passed through my data stream! Is this a lot? Well, yes, but ultimately we have to remember that Alteryx processes everything in memory. Knowing this, the information that we see above doesn't mean we have 114gb of data being written directly to disk (many PC's don't even have this much available). Simply put, there is a ton of data there but if you do not have any type of output connected to the tool, it stays in memory. If we were to connect say, a Browse Tool to the end of my XML Parse Tool shown above, the temp file written out by my Browse Tool would in fact be every bit of that 114gb. Luckily, I don't really need the data written out at this point (I'm performing further analysis downstream), so I simply add a Select Tool just after this and de-select the field with the massive amount of data and just like magic, my module runs very fast and efficient. This little bit of info can be both extremely valuable and scary at the same time. The value is simply that it shows you the amount of data you are dealing with. The scary part is that it can be assumed this is all being written out to disk during runtime. We now know that as long as we're not attaching a Browse Tool to the data at this point, and we deselect the fields we do not need further downstream, we keep our module tidy and efficient! Until next time, - Chad Follow me on Twitter! @AlteryxChad
Alteryx is designed to use all of the resources it possibly can. In order to make Alteryx run as fast as possible, it tries to balance the use of as much CPU, memory, and disk I/O as possible. The good news is that most of the resource utilization can be controlled. You can limit the amount of memory that is used on a system, user, or module level. The Sort/Join memory setting is not a maximum memory usage setting; it’s more like a minimum. One part of Alteryx (sorts) that benefits from having a big chunk of memory will take that entire amount right from the start. It will be split between all the sorts in your module, but other tools will still use memory outside that sort/join block. Some of them (e.g. drive times with a long maximum time) can use a lot. If a sorting can be done entirely in memory, it will go faster than if we have to fall back to temp files, so that’s why it’s good to set this higher. But if the total memory usage on the system pushes it into virtual memory, you’ll be swapping data to disk in a much less optimal way, and performance will be much worse and that’s why setting it too high is a bigger concern. The Default Dedicated Sort/Join Memory Usage can be found in the Designer at Options > User Settings > Edit User Settings Best Practices on Memory Settings 32-bit machines*: Setting should be on the lower, conservative side. No matter how much actual RAM is there, only has at maximum 1 GB available, as soon as it is set higher, the machine will cross over into virtual memory and be unable to recover. A 32-bit machine should never have a setting over 1000MB, and 512 is a good setting. Set it low (128 MB), especially when using Adobe products simultaneously with Alteryx. 64-bit machines: Set this in the system settings to half your physical memory divided by the number of simultaneous processes you expect to run. If you have 8 GB of RAM and run 2 processes at a time, your Sort/Join memory should be set to 2GB. You might set it lower if you expect to be doing a lot of memory intensive stuff on the machine besides Alteryx Set your Dedicated Sort/Join Memory Usage lower or higher on a per-module basis depending on the use of your computer, doing memory intensive non-sort work (i.e. large drive-times) then lower it, doing memory intensive sort-work then higher.
*Please refer to this link for additional details on 32-bit support for Designer
Here are the two most recommended best practices for optimizing module speed: Disable your Browse tools
Disable your browse tools via the Module Properties window. This checkbox can allow your module to run faster, and take up less memory and temp space by preventing Alteryx from having to generate the content (temporary .yxdb files) that must be displayed in your Browse tools.
Close the Output Window
If you find that your module produces warnings, go ahead and close the Output Window in order to speed things up for your module. This is a feature that if asked to display hundreds or thousands of lines of information, can slow down your module. To reopen the Output Window, go to View > Show Output Window.
The Auto Field tool examines your data, and automatically optimizes the field type and length. Take a look at your data with a Select tool, follow this up with an Auto Field, and follow that up with another Select tool to see what kind of changes you’ve made. After you run the module, you can examine each Select tool to garner a before-and-after view of the adjustments made to the fields. You can even take this a step further, and add a few Browse tools to see how your database actually decreases in size, you may be surprised by how much! In the below view, the file size was reduced by about 40% with the Auto Field tool, on just 50,000 records and one field. Now consider running a file of millions of records and the amount of the size decrease becomes really substantial! Before Auto Field View: After Auto Field View: