Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Engine Works

Under the hood of Alteryx: tips, tricks and how-tos.
JPKa
Alteryx Alumni (Retired)

Introducing BB Date

 

BB Date is a new Alteryx tool that handles dates so you don't have to. 

 

So_what.jpg

 

Dating Isn't Fun

 

Let's be honest folks.  Dating is never fun, whether we're talking about the romantic variety or the data variety.  However, unlike the romantic variety, dates in data don't get to stop just because you throw an overpriced party filled with family members whose names you can't remember.  Like the story being told by the inebriated wedding patron who grabs the microphone to make a toast when the toasts are supposed to be over, dates in data will never end.  

 

Since we can all agree that our bosses and customers are going to continue to care about the "when" pertaining to our data, Dates and DateTimes are something our Alteryx workflows are just going to have to deal with.  But what about the cases where our data just isn't cooperating?  Now, if you're reading this thinking "Dates are easy.  I never have issues with dates!", I have a message for you:  Stop.  Please.  You are lucky.  Perhaps you have data that is coming straight out of automated systems which all use the same formats.  Enjoy all of your free time. The rest of us toil in the factory of sorrow created by a renegade user in another department who makes the data as messy as possible.  You know who I'm talking about: that user who decides that the date format used by all of the other systems in the company is a recommendation, and fulfills their patriotic duty to use a date format that was only used on their nation's submarines and nowhere else.  

 

Now, assuming you do have data like this, and disparate groups of users sending it to you, you could always ask these users to fix the formatting of the dates, but you consider youself lucky that they're sending you ANYTHING, so you do what we all do: mutter under your breath about "kids these days" while you clean up their mess.  

 

Mutter under your breath no longer.  You have a friend, and that friend is BB Date.  Any resemblance to adorable, spherical robots in certain recent movies is purely conincidental.  

 

BB Date is Here to Help

 

auto_date.jpg

 

What does BB Date do?  Well, let's walk through a scenario that is all too common in the business world.  You've been given a big spreadsheet with several columns that each contain various kinds of dates.  They were copied and pasted into this sheet by dozens of users over the years, and the users were obtaining the data from a plethora of systems, some new, some long-retired, and others unknown.  The data looks like this:

 

Capture1.JPG

 

The Old Way 

 

This is ugly data created for the purposes of this blog post.  Let's examine what happens with this data set with the current Alteryx date tools.  First, we'll grab the DateTime Tool and pull it onto the canvas.  After connecting this ugly data to it's input, we'll see a few things that are a bit tough to swallow:

 

 old_parser_setup.jpg

Let's go ahead and try to use this tool on the "Join Date" column of the data.  The column has mixed formats, so we'll go ahead and pick the format we think is most common.  In this case, the format is "yyyy-MM-dd".  

 

The output is captured below:

 

 old_parser_output.jpg

 

 As you can see, it is up to the user to select a date format for the parser.  This by itself can be difficult for the newer, less technically experienced users.  Adding to this, the parser will only try the format selected.  

 

This is not necessarily a bad thing in certain scenarios.  There are going to be times where a user with a given dataset will not want mixed format dates to be handled at all, because it may be a sign of bad data they want to know about.  However, many times a user knows they have messy data, and they simply want to process it as easily as possible.  

 

There Has To Be a Better Way!

 

foil.gif

 

Enter BB Date.  Let's see it in action below:

 

bb_date_output.jpg

 

 As you can see above, BB Date is pretty self-explanatory.  It just works.  It's not a miracle worker (it expects the date format fed into it to MOSTLY conform with one of the dozen or so standard formats), but it is a great tool for drastically simplifying the process of dealing with dates in Alteryx.  For specific date formats that BB Date will work on, please see the table referenced in this Alteryx documentation here.  (Note:  BB Date doesn't work on certain date formats with only a 2 digit year.  See the Limitations section at the bottom of this article for further explanation.)

 

So How Does It Work?

 

At it's core, this macro is actually made up of a macro within a macro within a macro.  Cue the Inception music. 

 

Below is an image that captures the high-level structure:

BB_Date - New Page (2).png

 

The innermost macro is an Iterative Macro whose sole job is to take a single column of data, draw a random sample of 1000 values, and then run logic to progressively find the most common date format in the column and output the format.  It works recursively, meaning that after the first pass, it ignores the most common format and looks for a second most common format in that same column.  Currently, for performance reasons, it's been limited to make only 3 passes, meaning that it will detect up to 3 different date formats for each column it works on.  It takes in a column of dates as input, and outputs the 1-3 of the most common date formats it detects within that column.  See the image below for details:

 

innermost.jpg 

 

Calling this "Date Detection" iterative macro is a batch macro, which runs this process on the one or more columns a user has selected in the BB Date interface.  Since the iterative macro it contains is only running on a sample with a maximum of 1000 records, the performance remains solid on larger data sets.  For each column, the batch macro outputs the top 3 most common data formats detected. If less than 3 are detected, one or two of these columns will be left with Null values.  It outputs these top 3 formats as columns appended to each value for each column in a simple, standardized key-value structure which allows for maximium flexibility.  See the image below:

 

batch_macro.jpg

 

 

The "outermost" part of the BB Date macro is a Standard Macro which contains the above batch macro.  This macro contains all of the logic for performing the actual conversion of the selected date fields utilizing the 3 most common detected date formats for each field it processes.  Notice that the actual processing uses a series of Formula tools.  These tools perform a series of basic string processing tasks culminating in a final Formula tool which tests the DateTimeParse function on each field with the 3 formats, in order of 1st most common to 3rd most common.  This is a "trick" which takes advantage of the fact that the DateTimeParse function will return "Null" if given an incorrect date format, but will successfully process a date when given the matching date format.  After these key steps, the macro proceeds with some relatively straightforward post-processing steps to ensure that the output data matches the format of the input data as much as possible.  See the image below for details:

 

final_bb_date.jpg

 

Limitations

 

BB Date isn't capable of magic yet, although we plan on implementing the latest versions of sorcery into the product as soon as we can find an ACTUAL WIZARD (Seriously Alteryx recruiters, how hard can it be?  Just look on LinkedIn for someone who attended Hogwarts.   However, please don't hire certain people with red hair and clothes who will burn employees at the stake to boost profits.)  Anyway, since our recruiters have failed in attempts of hiring people competent in supernatural programming as of now, we're stuck with some grim realities of things BB Date can't handle:

 

  1. BB Date needs 4 digit years.  2 digit years are going to be problematic, because 04/08/02 could be interpreted numerous ways.  I really hope you aren't trusting a robot to interpret dates like this for you, because I've never had a boss who was cool with the "blame the robot" excuse.  
  2. BB Date will handle mixed date formats in a column, but is limited to a maximum of 3 different formats in a given column.  This was done for performance reasons, and an arbitrary line had to be drawn somewhere.  

 

Other Options

 

BB Date is one of many tools built by folks in the Alteryx Community that helps with dates.  If you have some date formats that BB Date won't handle, you might want to try a tool built by the bow-tied maestro @AlexKo that helped inspire BB Date.  Alex's tool (found on the Gallery here) isn't as "auto-magic" of a tool as BB Date, but it allows users to specify their own date format, providing for maximum flexibility.   

  

Wrap-Up

 

Hopefully, BB Date is a useful tool that will save you headaches.  More importantly, I hope that you can crack it open, look at the insides, learn from it, and make it even better.  It's not going to be perfect, and I can't wait to see what the community can do to improve it.   

 

I'm looking forward to your feedback, comments, and questions.  Here's to better dating experiences!  

 

BB Date can be downloaded from the Gallery here.  

 

 

JP Kabler
Lead Software Engineer

I'm a data geek with a background in consulting in both the defense and retail sectors, where I used a combination of Python, R, and SQL for helping to build analytic and predictive tools. When I'm not coding, I like to run, hike, lift, and embarrass my kids with dad jokes.

I'm a data geek with a background in consulting in both the defense and retail sectors, where I used a combination of Python, R, and SQL for helping to build analytic and predictive tools. When I'm not coding, I like to run, hike, lift, and embarrass my kids with dad jokes.

Comments