Engine Works

Under the hood of Alteryx: tips, tricks and how-tos.
peldridge
Alteryx Alumni (Retired)

We members of the data team crunch a LOT of numbers! Sometimes, our modules run agonizingly slow, so we decided it was time to review our best practices for Alteryx performance. Yours truly took on the task of gathering and compiling the available company knowledge regarding Alteryx Best Practices. And now it is time to unveil the result! Enjoy.

 

Thanks once again to everyone who contributed.

 

Coding Practices

 

Keep it Lean for performance!

  • Don't handle data you're not going to process.
    • Place a SELECT tool immediately after your INPUT limiting your data stream to the fields that you are concerned with, do not check "*Unknown". Add descriptions. Even if you are not currently limiting your data stream, adding this tool now could save you headaches down the road if the architecture of the INPUT is outside your control.
    • If the input is SQL, use the SQL editor to limit the fields coming in
    • Place a FILTER tool immediately after the SELECT reducing your data stream further.
    • Be careful with spatial processing and make sure that you are not creating objects that you don't need. They are large and can slow things down.
    • Eliminate unnecessary columns in joins.
  • Don't process unneeded whitespace.
    • Update the size of the string data types to be real-world, not the default. US state codes are a string(2), leaving it at V_String(254) is wasteful. You're saving space and processing time by explicitly stating the size, less work for the engine.
    • Appropriately size your columns, use the autofieldstrings – sometimes columns are absurdly and unnecessarily wide
  • Be number savvy.
    Use a FORMULA tool to convert strings to numbers if you'll be doing math on them downstream. Number data types are conservative with their space usage. Also, doing this all upfront makes for a cleaner canvas as you're not having to add FORMULA tools before a SUMMARIZE, for example. Disclaimer: some strings can be converted to other data types in a SELECT tool, caveat emptor.
  • No doodling.
    Remove BROWSE DATA and BROWSE MAP tools. Rendering that data can be expensive. Once a module is ready for prime time, disable the browse tools. On the advanced tab select "disable all browse tools". Also, there is no reason to have a browse tool in an automated process.

Stay organized. Modules can get very complex, quickly, and it help to have things done clearly. Do this in your own best interest. It's amazing how much you can forget quickly, especially when the pressure is on to get things finished.

 

  • Plan your algorithm first! Then, you can start actually placing tools, and configuring those tools to do what you want the module to do, after you know a general plan.
  • Make it readable!

     

    • Lay it out neatly, so the flow is easier to track
    • Separate data processes and reporting processes in the module so you can quickly improve and trouble shoot.
    • Use color coding to identify the purpose of various groups of tools in a module. For example, putting a red box around groups of tools that are only used for debugging or a green box around tools that receive wizard input.
    • Turn off annotation when building large modules.
    • Name connections coming into reporting tools. This allows easier organization of report snippets and helps with the legend in mapping.

Document, Document, Document!

  • Document everything in the module using text tools.
  • Comments should be written for someone who is not familiar with the module, not for someone who is familiar with it.
  • Consider external documentation for the more complex modules.
  • Put a simple text line with your name and date, sometimes with a comment as to what was revised. This is a simple text box with the borders turned off, shoved into an out of the way corner.

Quality Control

  • Write tests.
  • Use the event function and email
    • consistent subject lines in the automated emails so that Outlook can catalog them quickly
    • text in the body of error messages that tells the user *what* to do, not just "it's broken"
    • This isn't necessary in DNow code. When an Alteryx module fails, we get a standard error e-mail which has enough information for us to trace the problem.
  • Peer code review, having someone else present your work in a group setting where they are responsible to explain the why, what & how will quickly demonstrate how not everyone thinks alike ... there are gazillions of resources on the Internet for facilitating effective peer review
  • When working on an automated process, make sure you know what version of Alteryx is on both beta and live sites, and develop on the earliest version (or downgrade to the earliest version if necessary).

Don't reinvent the wheel

  • Build a Macro when creating a process that will be used on multiple occasions.

Wizard-Related Coding Practices

 

  • Update tool values with full replace or formula – whenever possible, refrain from doing a "replace specific text" option as often times the tool will be reconfigured in the UI and that value may no longer exist.
  • Try to keep conditional statements/flow in the wizard update/action properties, not within the tool update formula itself – this just makes it a bit easier to see the update flow more quickly.
  • Give good descriptions for each wizard input and update action – dealing with multiple tools and updates that have no description is a bit tedious.
  • Provide default wizard input values so the user has at least a general idea of what needs to be submitted.
  • Use relative paths for datasets. Avoid hard coded values of datasets if/when possible. As data changes think of ways a module or wizard can accommodate changing data or sets of data without having to reconfigure tools each time. It can be time consuming to track down problems when wizards are integrated into sites and down the road data sets are changed. Also, mixing relative and non relative paths can be confusing.
  • Rename tools when building complicated wizards. When renaming tools, use a standard
    • sel_
    • fil_
    • join_
    • mjoin_
    • for_
  • Web developers should mark the tools updated through the wizard interface.
  • Wizard input descriptions should closely match the value names, and the order they are written in the plugin code should match the order in the test wizard. We debug a wizard problem by loading the test wizard with data from the error report, and if the names are out of synch, and/or in different orders, it can be difficult to figure out what goes where. This greatly helps with debugging.

Settings

 

  • Make sure you've got all of the user settings tailored to what suits you.
    • For example, you could turn the XML view on. With this checked, you have another option to view the tool's XML in the properties window. Very helpful when trying to determine previous settings, Allocate settings, etc...
      • Tools->User Settings->Edit User Settings->Advanced
      • Check 'Display XML in Properties Window'
    • turn the Snap to Grid off:
      Unchecking really helps to make straight lines in the modules and keep track of where things go.
      • Tools->User Settings->Edit User Settings->Document
      • Uncheck 'Snap to Grid'
  • Maximize the temp space dedicated to Alteryx

 

Tools

 

  • Rename spatial objects to what they are so that tools downstream can be configured properly.
  • When performing a spatial match (pt in polygon)have the polygons specified as the Target
  • Use the dynamic input tool for spatial matching
  • Use yxdb files as input rather than XLS or other flat file options
  • If the input is SQL, use a SQL login not a network login
  • When setting up large data sets (data1 to data2) type fuzzy matches, it saves a ton of time and processing if you do a join upstream on the fields you will be doing the fuzzy matching on and eliminate them from the fuzzy match because they were exact.

Paula Eldridge

Comments