Maveryx Success Stories

Learn how Alteryx customers transform their organizations using data and analytics.
STORIES WANTED

Showcase your achievements in the Maveryx Community by submitting a Success Story now!

SUBMISSION INSTRUCTIONS

Turning Up Your Process

cbalas
Alteryx Alumni (Retired)

logo.png

Overview of Use Case:

At Alteryx, we acknowledge the value in highlighting some of our strongest advocates, and thus have created the Alteryx ACE Program. The ACE program highlights a few of the most influential, supportive experts of Alteryx Analytics and recognizes them for the meaningful impact they are making, not just inside of their organizations — but across the analytics community. In this case the ACE discusses how to accomplish things faster by using Alteryx to do Delta Processing, CASSing, Geocoding, and the importance around these capabilities.

 

Describe the business challenge or problem you needed to solve

The following example is about a master data management project that we were working on with a marketing company a few years ago. They had six different data sources coming in from different places. It was survey data like a card you fill out at a trade show. We noticed we had a lot of similar contact data and we wanted to create three of these master records so that we aren't contacting them four or five times because we've got them in our database four or five times.

 

Describe the Solution

So the first step was to identify the similarities between disparate data sets and clean them so that we can make comparisons. We started by using four different IDs. We had a building ID, so we pinpointed everything to a location ID, an address ID for a specific business, a household ID which helped us with groups of people at one location, and individual ID’s to account for people moving from a current or prior residence. We needed filters to make sure we weren’t doubling or tripling our efforts.

clipboard_image_0.png

To further filter our data we needed to create a key where one didn’t exist, so we created hash values in Alteryx. Hashing is a form of cryptography that will take string data and it will convert it into a 32 character hexadecimal value. If you think about it, you've got 32 characters, and those characters could potentially be 16 different things. So as it's creating this key, the likelihood of any two string values having the exact same hash value at the end of it is incredibly slim to a point that for key building we found it to be precise. To do this in Alteryx there's two formulas, one's called the ND5 ascii, and one's MD5 Unicode.

clipboard_image_1.png

The ascii works with, if you've got fields that have ascii characters in it, and if you don't you can use the Unicode formula. MD5 is commonly used as key generation and data validation. So that's why we find it to be okay here in this software. We’re going to compartmentalize it.

 

Now with your address info, the first thing you want to do is pre-clean the data. You’ll want to look at common address problems and create your own separate database in order to to do find and replace and update records before we hit any of the heavy lifting with tools like CASS. So for example this little data set I have here, you can see 15 different ways to write street.

clipboard_image_2.png

Luckily as you go through this and if you're starting to store these versions of street that you're seeing, or other words/terms that have multiple variations, you can put this back into an Alteryx file somewhere on a shared address that everyone can refer to when doing this cleaning. I recommend learning the RegEx function in Designer because it's super powerful for any sort of string parsing. And sometimes you'll get your data/results in a single column while cleaning. So you will need to build out processes that can split that up, or if you are using the hash key, you can create a single value from whatever exists in one of those. If you have a data set that's coming in with five columns, you would just create a formula that brings them all together or hashes them.

 

Sometimes you'll find overlapping hash value returns, because it's just a concatenation, essentially, of a string. To help further with the reduction process there are two tools in Alteryx you’ll use for cleaning, first the CASS tools and then the Geocoder. They both have their strengths. CASS does most everything you're going to need to for address based on United States Postal Service data set. It's got things even like census boundaries and postal routes. The street Geocoder return long values.

clipboard_image_4.png

I would suggest using CASS for most of your address cleansing things. Catching as much as we can. So the next step here is that as you build these out it generates a list of acceptable error codes or acceptable solutions, and if you're going to push them through or not you will also want to create a matching hierarchy. There will be multiple levels of using the Fuzzy Match tool. Things will match on your first level. They'll match on your second level, and may have a match on your third level and if they do match at multiple levels, you'll want to be able to take the best match and score. You can then store everything locally, and then if you're starting to see problems in your data on the other side you can pinpoint directly where these matches are happening and then go back and adjust your logic as needed.

clipboard_image_5.png

You may be wondering why this whole concept of using the hashing and key values is important. I'm sure every organization faces this on some level. If you've got Server or if you've got some sort of automation process you're fighting for space pretty much at all time. I know we've been using it at my current organization for eight months, and we've already got 70 jobs running overnight. Some of them take a long time. Now if you've already done the hash you can just assign whatever values needed through your CASS and Fuzzy Match back to the new records.

 clipboard_image_6.png

With hashing, CASS, and Fuzzy Match you can avoid waiting for that whole scheduled automation process to finish to get results immediately. However, that scheduling process did help. We were able to avoid a lot of reprocessing of records and were able to narrow them down to distinct values.

 

Describe the benefits you have achieved
As I've gone through my life in Alteryx, I'm at this point where every time I use Alteryx I just want to make everything faster. For example, hashing which is actually a really cool concept, can even be done in Alteryx.

You can use “fuzzy matching Alteryx.” It's really intimidating at first but once you get into it you're like, "Oh, this all kind of makes sense." And it can provide a lot of really great opportunities for your organization. Part of the fuzzy matching logic is that there are actually nicknames built in Alteryx. The Fuzzy Match tool has a table of nicknames that correspond to full names, so you can choose an option to match names to nicknames.

 

Related Resources

The entire PowerPoint presentation can be found here