This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Between the RegEx , Text To Columns , and XML Parse Tools , the Alteryx data artisan already has an exceptionally robust selection of tools to help parse uniquely delimited data. However, there are still some data sets so entangled in formatting that it’s labor intensive to parse even for them. Enter the Find and Replace Tool , which captures the ability to find your nightmarish parsing workflows and replace them with sweet color by number pictures. Just kidding, it finds bad jokes and replaces them with good ones. Seriously, though, you could do both if you wanted to because this tool has the capability to look up a table of any number of specified targets to find in your data and will replace them with a table of specified sources. With the help of a few quick configuration steps, this tool can simplify some parsing use cases significantly.
RegEx….it can be tough, but extremely useful when looking for ways of extracting information from a string. Regular Expressions are basically a code you can write to match a specific set of characters (it's a pattern matching syntax). It could be something as simple as finding the three digit number in the string “Eample123Eample” or something more complicated like using hexadecimal to select a certain range of characters. RegEx is something that can be used in Alteryx via the RegEx tool in the Parse tool set and some of the main uses for RegEx are replacing, matching, and parsing specific characters within a string. Replacing text is a very handy process when working with string fields, as it can be used to replace virtually any set of characters. In the attached example module you will find the following data in Text Input Tool: What I would like to do for this example is replace the portion of the string “file” and only that portion. The easiest was to do this using the RegEx tool is to follow these steps:
Make sure that the correct field is selected in the Field to Parse drop down.
Enter “file” as my RegEx in the Regular Expression text box.
Make sure that your Output Method is set to Replace.
Finally, enter a replacement text in the Replacement Text text box.
In this example, I choose “FileNumber_”. Once a browse is connected and the module is run, you can see that the output have replaced all of the character matching “file” with “FileNumber_”. Using RegEx to match is another process that can be very helpful for your data preparation or transformation process. Using the Match output method will assign a True or False value for you data based on the RegEx that you have written. I’m going to use the file initial data that was used above, but this time I was to match only the records that are “file1…”, “file2…”, and “file3…”. To do this, I have written a regular expression of: file  .+ . The breakdown of the RegEx script is as follows:
First, I am searching for “file” much like I did in the replace example.
Next, I am using brackets “”, to only look for the numbers 1, 2, or 3. Square brackets are used to match specific characters, just like an OR statement in a Formula Expression.
And lastly, “.+”. The period will look for any character and the plus will look for one or more additional characters. This covers the file extension in our data.
As you can see in the below output, only records that had a file name of file1, file2, or file3 have been returned as a True match, while file4 returns False: Parsing with the RegEx tool can be a little bit more complicated, but again it is an extremely useful process that will help in your data prep and transformation stages. The nice thing about parsing with the RegEx tool is that it also allows you to alter the fields upon output. Once you have a valid regular expression for parsing, you will be able to change the field name, the data type, and the size of the field. The trick to parsing using RegEx is that you have to create a group, and this is done by using parenthesis “()” to identify each group. You can use multiple group parse into multiple field, but for this example there will just be one group. Using the same initial file name field, we will use a regular expression to parse just the file name without an extension. For this I have used the following expression: ^(.+) . .+$ . Just like before, the breakdown is as follows:
Using a “^” in RegEx means to look at the start of the string. This portion “(.+)” is looking for my group. In this case, any number of characters. And since I added a “^” to the beginning, it will look at the beginning of the string.
The ".." portion will look for the period in between the file name and the extension. Using a slash before the period will tell RegEx that you are actually looking for a period (the slash is called an Escape character). Otherwise a period is a wild card, meaning it will return any character.
The "+$" is the last portion where we look for the extension. Using “.+” will look for any characters, which will account for all of the extensions, while the dollar sign is making sure that this is looking at the end of the string.
Here is the outcome of using the parse method to find just the file name: I hope this helps you get started with regular expressions! As always, if you have any questions please don't hesitate to email firstname.lastname@example.org. Special thanks to Mike Akey for putting together this post, until next time!
One of the greatest strengths of modern web APIs is their flexible, developer-friendly nature, which provides numerous options for both the provider and the user. However, this flexibility can make it more intimidating for business users to deal with the various data formats that these APIs provide. The purpose of this article is to familiarize you with the main data formats used by the vast majority of web APIs, and provide the basic knowledge that will allow you to confidently process the data they return into a typical tabular format.
The Fuzzy Match Tool provides some pretty amazing flexibility for string joins with inexact values – usually in the case of names, addresses, phone numbers, or zip codes because many of the pre-configured match styles are designed around the formats of those types of string structures. However, taking advantage of the custom match style and carefully configuring the tool specific to human entered keyword strings in your data can also allow you to use the loose string matching feature of the tool to match those values to cleaner dictionary keyword strings. If done properly, it can help you take otherwise unusable strings and, matching by each individual word, recombine your human entered data to a standardized format that can be used in more advanced analyses.
As currently designed, the Amazon S3 Download tool only allows one file, or object, to be read in at a time. This article explains how to create a workflow and batch macro that will read in the list of objects in a bucket and allow you to filter for the file(s) you want using wildcards!
One of the biggest reasons why people love Alteryx is that it has the ability to read a very large number of different data sources. This article includes a workflow that is able to read in non-natively supported formats like a Word doc or pdf by using a open source program to convert these formats to plain text.
The RegEx tool is kind of like the Swiss Army Knife of parsing in Alteryx; there are a whole lot of ways you can use it to do things faster or more effectively, but even if you just use the blade it's still immensely useful. Sometimes that's all you need, but if you do take the time to figure out how to use a few other tools in that knife, you'll start to see that there isn't much you can't do with it .
These methods should work in most versions of Alteryx up to and including 10.1.
One of the most common issues we see from clients trying to read in a .csv file is that they are receiving an error starting “Too many fields in row x”, and because of this Alteryx can’t read in the file.
There are a couple of different ways you can resolve this error.
First, in the Input Data tool, you can tell Alteryx to treat read errors as warnings to allow the file to be read in.
This will change the Error to a Warning so that Alteryx can read in the file so you can investigate.
The other method is to read the file in with no delimiter. You can do this by changing the delimiter in the Input Data tool to \0.
Once you have your data parsed back out into its fields, you can use the Dynamic Rename tool to correct your field names, a select tool to remove the original field, and a simple Trim() function to remove the extra delimiter from your data.
This process is illustrated in the attached workflow, created in version 10.1.
Date/Time data can appear in your data in string formats (text fields) or date formats. The DateTime Tool standardizes and formats such data so that it can be used in expressions and functions from the Formula or Filter Tools (e.g. calculating the number of days that have elapsed since a start date). It can also be used to convert dates in datetime format to strings to use for reporting purposes.
The Sample Tool allows you selectively pass patterns, block excerpts, or samples of your records (or groups of records) in your dataset: the first N, last N, skipping the first N, 1 of every N, random 1 in N chance for each record to pass, and first N%. Using these options can come in the clutch pretty often in data preparation – that’s why you’ll find it in our Favorites Category, and for good reason. While a great tool to sample your data sets, you can also use it for:
In a workflow, not too far, far away...
Structured data has vanished. In its absence, the sinister Dirty Data Order has risen from the ashes of the Server and will not rest until Data Analytics have been destroyed.
With the support of the Alteryx Engineers, Solutions Engineer Tony Moses leads a brave RESISTANCE. He is desperate to find structured data and gain its help in restoring blending, joining and analytics to the galaxy.
Tony has sent his most daring Community Leader, Matt DeSimone, on a secret mission to Jakku, where an old ally has discovered a clue to the structured data whereabouts....
Welcome to the Star Wars universe!
Ever wanted to know the most important details of your favorite characters from Star Wars? Me too!
Our generous friends, Paul Hallett and team, have given us the Star Wars API - the world's first quantified and programmatically-accessible store of Star Wars data.
After hours of watching films and trawling through content online, Paul presents us all the People, Films, Species, Starships, Vehicles and Planets from Star Wars.
The data is formatted in JSON and has exposed it to us in a REST implementation that allows us to programmatically collect and measure the data.
Now, how was I able to retrieve this treasure of information via Alteryx? Easy! I've built a REST API connection using the Download Tool to pull information based on a user inputted query in an Alteryx Application (attached as v2018.1 Star Wars.yxwz).
Normally, once having retrieved JSON formatted data, structuring and parsing the data would be a nightmare! With Alteryx, this is just one tool away. The JSON Parse Tool allows you to identify the JSON field, in this case our download data field, and easily extract Name and Value columns. From there it's some simple formatting and using the reporting tools to present us a nice clean composers file (pcxml).
Man, if only the Rebels could process information as fast as Alteryx then they wouldn't have had to send poor R2 to find Obi Wan.
I'll be bringing you, the Alteryx Community, updates of the app with each new movie release!
I hope you enjoy the API and may the Force be with you!
Easily the most used tool in the parsing category , the Text To Columns Tool makes for an extremely quick dicing of delimited fields. To use it you only need to specify a delimited field, delimiter(s), whether you’re parsing to rows or columns (you’ll need to specify a number of columns to parse into with this selection) and you’re off. Any way you slice it, this tool has you covered:
Question I have a table of sales data with each column being a week's worth of sales. I only want records that have data in each of those fields and want to filter out all records that have Null values. How can I do this?
Answer There are two basic elements necessary to make this happen. The first is that all records in the original table have a unique ID. If you do not have a unique ID in your data, go ahead and add a Record ID Tool.
In the sample data you can see we will want data from Rows 1 and 6 while filtering out each of the other records because they contain null values.
From here we will use the Transpose Tool to pivot your data into 3 separate columns. In the transpose field choose your unique ID as the KEY FIELD and make sure all other records are selected as DATA FIELDS.
The result is that you will have your unique ID field, a field called [Name] which contains the names of each of the fields in your data, repeated for every unique ID in your original data, and a [Value] field which contains the individual values for each of the records for each of the columns in the original data.
Now we want to search for Nulls, and get a comprehensive list of the UniqueID values that do not contain Null values. Now is the time to bring in a Summarize tool and GroupBy your unique ID field, and then use the CountNull action.
The result is a list of how many nulls exist in each of your unique ID groups.
Next we can simply filter out the fields that have 0 null values in them and then use the unique IDs to join back to the original data, and pull only those records.
It's important to note here that because I'm only interested in the original fields I intentionally chose to deselect the unique ID and the Null Count fields from the output of the join so that I am left with only those records that have data in all of the weeks.
See the attached v10.5 workflow for an example of the approach above.
You may have seen the modulo (mod) function used in the awesome Collatz Conjecture macro on the Gallery; here, it's used to determine the parity of the entered integer. Depending on if the number is even or odd, a different set of operations is applied to the value.
If you're using the Formula Tool in v11.0+, be sure to familiarize yourself with the tool's interface redesign!
Often, clients try to format a plain numeric value to a currency format. A simple way to accomplish this is to use the optional "addCommas" parameter in the ToString function. This function converts a numeric value to a string, using a user-specified number of decimal places [numDec]. Though Alteryx doesn’t require the third parameter, if called, the third parameter inserts the appropriate commas (1 or true means format with commas).
ToString([original_value], 2, 1)
Since the new value is a String, simply add in the dollar symbol:
'$'+ToString([original_value], 2, 1)
When importing a delimited file, have you ever run across an error saying ‘Too many fields in record…’ and it’s something like record number 1478254 and you think to yourself, “how the heck am I going to find that record and correct it?”
Well, let us help you with that.
This message is essentially telling us that some particular row of data, when parsed, has more fields that the first row of data (usually caused by an extra delimiter in that row).
The first step in identifying the row is to change the delimiter in the Input Data configuration window. Change the delimiter from the current delimiter to \0 (that is a backslash zero, which stands for no delimiter). This will read in your file but instead of parsing, it’ll just read in the data as is.
Now just add a RecordID tool and Filter tool so we can filter on RecordID 2 (or 1478254) to find the row that is holding us up.
Now that you’ve identified the row that is causing issues, you could go back to your original data, correct it and try importing it again.
If you were looking closely at the example data, you may have noticed that record 4 also had a different number of fields as row 1. After correcting row 2 and importing , again, we would get an error for row 4. Just follow the same procedure to correct additional errors. This could probably be done through an automated/iterative process, but that will have to wait for another time.
(attached is an example workflow created in Alteryx 10.0)
Recently a client reached out to us, saying that "all of a sudden" curly quotes were coming out of his keyboard, and Alteryx was rejecting them as unrecognizable. These are called SmartQuotes, and they are purely aesthetic. This can also happen when cutting/ pasting from other programs. Either way, Alteryx doesn't like them.
To change them back to the standard, useful straight quotes we all know and rely on, the first thing to try is CTRL SHIFT '
This will toggle between quote types.
An extra precaution is to make sure that your 'Regional Language' setting is English (United States); this is different than US English International. To check, in your control panel, go to Region and Language, Keyboards and Languages tab, and click on Change keyboards:
Make sure that your installed services is English (United States). If you do happen to require another input language, under the Advanced Key Settings, it will show you the key sequence to toggle between those keyboards.
This last step is likely unnecessary, but it's a good option to have in case CTRL SHIFT ' doesn't work (and I haven't yet seen it not work).
Now, back to writing formulas and queries with ease!