This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Far more than just a window to your data, the Browse Tool has a catalog of features to best view, investigate, and copy/save data at any checkpoint you place it. That introspection to your data anywhere in your blending gives valuable feedback that often speeds workflow development and makes it easier to learn tools by readily visualizing their transforms. Be equipped, and browse through the catalog of useful applications below!
Data blending, transformation and cleansing..oh my! Whether you're looking to apply a mathematical formula to your numeric data, perform string operations on your text fields (like removing unwanted characters), or aggregate your spatial data (among many other things!), the Formula Tool is the place to start. With the examples provided below, you should be on your way to harnessing the many functions of the Formula Tool:
The Multi-Row Formula Tool functions much like the normal Formula Tool but adds the ability to reference multiple rows of data within one expression. Say, for example, someone was on the ground floor of a house and had a Formula Tool. They would only be able to talk to the people also on the ground floor. If they had a Multi-Row Formula Tool, though, they would also be able to talk to the people upstairs, in the attic, and in the basement as well.
Error while connecting to MySQL, from the Input Data Tool - Error: Input Data (13): Error SQLDriverConnect: [Microsoft][ODBC Driver Manager] The specified DSN contains an architecture mismatch between the Driver and Application.
Characters that are not on a standard English keyboard may need translation into Unicode or a language-specific code page for Designer and database drivers to read them correctly.
Characters with incorrect encoding will often appear as boxes or question marks in the Designer Results screen and error messages.
Unicode characters take more bytes than English ASCII characters. Changing the column type and increasing the column size may be needed. In Designer, the column size is the number of characters, not the number of bytes.
How To: Create a Calgary Database
A Calgary Database is a proprietary Alteryx format that allows users to query against a file of millions of records quickly without having to read in all the data. A Calgary Database is created with the Calgary Loader tool, which allows users to create a database from any type of input while selecting which fields to index. Calgary Databases are useful for running ad-hoc queries against a large dataset, e.g. ConsumnerView data.
Alteryx Designer (any version)
Bring the data to be written to the Calgary Database into Alteryx and transform it until it is in the desired format, keeping in mind opportunities to standardize values to make the indexes work better. For example, are all your ZIP codes properly and consistently formatted? Starting with Alteryx 5.0, Calgary Indexes are not case sensitive anymore, treating "CALIFORNIA" and "California" the same. However, if some of your data uses the full state name and some uses the state abbreviation, and you are planning on using state as an index, you should pick one and use it consistently. You might also want to add flags that other users might find useful for querying data. For example, create a flag to indicate the current month's (or quarter's or year's) data or a custom region such as "NorthEast", "South", "Midwest", etc.
In the Calgary Loader tool, map the location of the Calgary Database in the "Root File Name" box. The tool will create a .cydb file (the data file), multiple .cyidx files (the index files), and an __Indexes.xml file that contains the index values. Since it will write out multiple files, a best practice is to have a folder dedicated to your Calgary Database. You cannot append to a Calgary database. To add records, rebuild the Calgary database.
Use the "Data" and "Index" columns to select which fields to include as data fields and which to index. Typically, all fields are included as data, but only certain ones are indexed as each index takes time to create. For index fields, the index type can be selected. "High Selectivity" is used for data with many different possible values, such as ZIP codes. Select "Low Selectivity" for data with fewer unique values, such as State or Region. "Low Selectivity" also creates a drop-down option for the Calgary Input Tool. By default, the index type setting is "Auto". In Auto mode, Alteryx looks at the first 1 million rows of data and decides if the index should be high or low selectivity. All fields with more than 550 unique values will be set to high selectivity. If the data changes after the first 1 million rows, Alteryx might select the incorrect index type. This option might also take longer to process since Alteryx has to look at 1 million rows of data for each index in Auto mode.
Use the Calgary Input tool to read in data from a Calgary database as described here: Querying a Calgary DB / File to Select and Limit Input Records. Did you know that you can read in the .cydb file the same as a .yxdb file in the regular input tool? However, you won't be able to query any of the indexed fields.
Using Custom Lists to query Calgary Indexes in Apps and Macros
Querying a Calgary DB / File to Select and Limit Input Records
Building a Calgary Database with "Searchable" Fields
The Select Tool within the Alteryx Designer is the equivalent of your High School Sweetheart. Always there when you needed them and helped you find out more about yourself. The Select Tool can do exactly this by showing you the data type and structure of your data, but it also gives you the flexbility to change aspects of your dataset.
Saving or running the workflow in the Designer causes the following error to occur:
An Unhandled Exception occurred. A previous action may not have completed successfully. Click OK to send the development team the error log so that we can fix this error in a subsequent release.
Checking the logs from %PROGAMFILES%\Alteryx\ErrorLogs\AlteryxGUI shows the following error:
Alteryx Designer x64 - 2019.2.5.62427 Type: System.ArgumentException Message: Cannot have ']]>' inside an XML CDATA block. Source: System.Xml OS Version: Microsoft Windows NT 6.2.9200.0 OS Is x64 Capable: True Selected Plugin: LockInGui.LockInSelect.LockInSelect Processor: Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz Private Memory: 380923904 -------------------------------------------- at System.Xml.XmlTextWriter.WriteCData(String text) at System.Xml.XmlElement.WriteElementTo(XmlWriter writer, XmlElement e) at System.Xml.XmlElement.WriteContentTo(XmlWriter w)
The use of "]]>" in the following tools (not an exhaustive list) causes the error:
Report text tool
Avoid using "]]>" in the tool or escape the ">" to ">".
You have multiple fields in your data that correspond to customer addresses. Some customers may have more than one address listed in their record. However, you want to whittle that list to one address per customer. That one address is the first, non-null address found when moving sequentially across a set of fields that contain address data.
For our example, we have three fields of data related to addresses: Address1, Address2 and Address3. The preferred field of data to use is Address1. However, if Address1 does not contain data, then we’ll use the data in Address2. If both fields of Address1 and Address2 do not have data, then we’ll use the data in Address3. The final output should contain a single list of the addresses highlighted in yellow in Figure 1.
Figure 1: The final output will contain a single list of the data highlighted in yellow.
Method 1: Write a Conditional Statement
The most common approach to this type of problem is to craft a conditional statement to select data based on a set of criteria (in this case, order). In the case of our data, it would look something like this:
IF IsNull([Address1]) AND IsNull([Address2]) THEN [Address3]
ELSEIF IsNull([Address1]) THEN [Address2]
ELSE [Address1] ENDIF
However, what if I had 20 fields of addresses instead of 3? Just getting that statement with three fields took me too long to write out! If you do have 20 fields, you might want to start typing that Formula now….
IF IsNull([Address1] AND IsNull([Address2]) AND IsNull([Address3]) AND IsNull([Address4]) AND IsNull([Address5]) AND IsNull([Address6]) AND IsNull([Address7]) AND IsNull([Address8])...
You get the idea. And now you’re thinking, “You’re going to tell me there’s a better way, right?!?” Well, yes...I am!
Method 2: Data Manipulation
An alternative method of solving this problem is to manipulate the data using the Transpose, Filter and Sample tools. I’ll share some advice from @RodL here: “If you want to make something…truly dynamic, then the ‘best practice’ when you are dealing with an unknown number of columns is to ‘verticalize’ the data”. In our case, we may know the total number of columns of address data we have; what we don’t know is which column the data we want is actually in.
Following @RodL’s suggestion, we’ll ‘verticalize’ the addresses using the Transpose tool. This stacks the addresses for each customer in order of the fields in the table (Figure 2). We’ll use the Client ID (or Record ID, if you’ve chosen to add one) as our Key Field and the fields that contain address data as our Data Fields.
Figure 2: All address fields per Record ID (or Client ID) are stacked vertically in order of the field sequence.
Since Null values are not usable records for our purposes, we’ll use a Filter to identify all the usable data (non-Null values). Now that our usable data is stacked vertically in order of field selection, we can Sample the first record from each Record ID (or Client ID) group. We’ll configure the Sample tool to identify the First N (where N = 1) Records from a group (Group by Record ID or Client ID).
Figure 3: Sample the first record from every Record ID or Client ID group.
After some data clean-up with a Select tool, we're left with a column of the selected address for each of our customers:
You monitor the mileage of multiple trucks as they deliver shipments over the course of a week and record additional information regarding each truck in a file (Truck Metrics). Each truck’s cumulative mileage per day is recorded in a separate file (Truck Mileage). Your goal is to update the mileage-related fields in Truck Metrics with the values recorded in Truck Mileage. Note: today is Tuesday so only fields for Monday and Tuesday will be updated in in the Truck Metrics file.
Manually Select Fields to Update
Whether the data is uniquely identified by a Truck Number (or Record ID) or identically ordered in both files, data from Truck Metrics (the table to be updated) and Truck Mileage (the data used for updating) can be Joined together. Then, using the Select functionality within the Join tool, I can manually choose the fields from Truck Mileage that I want to use to replace the fields that need to be updated (Figure 1).
Figure 1: Fields from Truck Mileage (Yellow) replace fields from Truck Metrics (Blue). Note that fields in yellow are selected while fields in blue have been deselected. Fields that need to be included from Truck Metrics (Red) remain selected.
Fantastic! A simple, straightforward way to update fields! But, as any analyst knows, working with data is rarely simple or straightforward. What if you’re dealing with 20 fields that need to be updated, not just 2? In that case, manually selecting fields to update is not only tedious but also error-prone. For these types of situations, I recommend a process that allows for a more dynamic approach.
'Verticalize' the Data to Dynamically Update Fields
Transposing, or ‘verticalizing’ data, allows for a more dynamic workflow when you have unknowns in your processing. In a scenario such as this one, you may have an unknown or changing number of fields that will need be updated in Truck Metrics. Using this approach, we’ll first Transpose both data sets to configure the Field Name and its associated value in a single row (Figure 2).
Figure 2: The data is transposed from Truck Mileage and Truck Metrics. The highlighted fields in Truck Metics indicates which fields will be updated, as the same field exists in Truck Mileage.
Then, we’ll Join our datasets based on two fields: Truck Number AND Name. This ensures that fields in Truck Mileage will match to the correct fields in Truck Metrics, assuming the fields names in both tables are named in the same way. The only selecting we’ll have to do is to make sure all fields from the Right data source (in this case, Truck Metrics) are deselected (Figure 3). This allows that, in the situation of matched Truck Numbers and field names, the updated values will be used.
Figure 3: The Joined fields indicate the fields that exist in both Truck Mileage and Truck Metrics. Fields in yellow (Truck Mileage) are selected to reflect updated values in downstream tools. Fields in blue (Truck Metrics) are deselected.
Note that any unmatched fields have fallen out of the Right side of the Join:
To add them back into the data stream, simply Union the Center and Right Joins together, setting the tool to “Auto Configure by Name”. Then, to rearrange the data into its original format, use the Cross Tab tool (Figure 4). And, voila! Your data is updated!
Figure 4: The updated fields are highlighted in yellow. All other fields originally included in Truck Metrics are included the in the final results as well.
Recently a couple questions came across the Customer Support desk asking how a fiscal calendar could be incorporated into a workflow. Natively Alteryx doesn’t have a tool to create one, but Alteryx does have a number of tools to make a fiscal calendar. Here is an example of how this can be done.
1. Determine the start and end dates of the fiscal calendar and enter them into Text input tool, each on an individual row
2. Connect a TS Filler tool to generate dates between the start and end dates
3. A Select Tool was added to remove unnecessary fields
4. Add a Multi-Row Formula Tool to create a Day of Week field by assigning each row a day of the week from 1 to 7
5. Add another Multi-Row Formula Tool to calculate the fiscal week. Whenever the Day of Week is 1 add a value of 1 to the previous row’s fiscal week value. This will create a running week value for the entire year
An example workflow is attached. Also in example workflow is an example of how the fiscal month and week of month may be added. (Workflow is in Alteryx 10.6 version.)
Unlike a snowflake, it is actually possible for duplicates exist when it comes to data. To distinguish whether or not a record in your data is unique or a duplicate we have an awesome tool called the Unique Tool that will actually turn your data into a unique snowflake.