Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Weekly Challenges

Solve the challenge, share your solution and summit the ranks of our Community!

Also available in | Français | Português | Español | 日本語
IDEAS WANTED

Want to get involved? We're always looking for ideas and content for Weekly Challenges.

SUBMIT YOUR IDEA

Challenge #4: Date Parsing

GeneR
Alteryx Alumni (Retired)

For the fourth challenge let’s look at parsing Dates from text strings. To view the previous challenge, click HERE.

 

A dataset contains a text field that has a date embedded within the text. The problem is that the date is represented a few different ways. For example:

 

  • 16-APR-2005
  • Nov•16,•1900
  • 4-SEP-00
  • Jan•5•2000

 

The goal is to create a new Date/Time field populated with the dates contained within the text field. You will also need to standardize the dates so that they are all formatted the same.

We have listed this as an advanced exercise since parsing out the dates can be challenging depending on the technique you employ to do it. As always, we love to hear your comments. Have fun!

 

UPDATE 12/7/2015:

The solution has been uploaded

dataMack
12 - Quasar
Thanks for posting these. I'm using them as learning exercises for the analysts on my team. Its fun to see the different approaches someone may take to solve the same problem- especially when they explain why. We all seem to learn new things this way.
TaraM
Alteryx Alumni (Retired)

We updated this exercise with the solution. Be sure  to check out this week's exercise.

Tara McCoy
dataMack
12 - Quasar

Thanks Tara. Would it be possible for the analyst the develops the recommended solution to post a brief write up explaining their choices in solving the problem? There's always multiple ways to approach such a problem and I think it's extremely helpful for learning to understand they 'why' behind a solution.

 

For example, in the first RegEx tool node, 4 output fields are configured, which then get evaluated into a single output via a sequence in a single formula tool. As I examine the RegEx tool, I wonder why 4 outputs were needed? Does each solve for a particular format? Are they designed to be mutually exclusive, etc.

 

undefined

 

I think of there was just a brief 1-2 sentence explanation for the particular choices, it would greatly enhance understanding of the solution.

GeneR
Alteryx Alumni (Retired)

Hi @dataMack,

 

Like you said, there is always multiple ways to approach work flows in Alteryx.  In the case above there was 4 different formats the dates could be in.  Since I'm not an expert with RegEx wanted to keep each in its own column for reformatting downstream (thus the 4).  If I was a RegEx master I'm sure there is a single statement to identify all the dates and extract them in a nice clean date-time format.

 

I will try to provide a bit more explanation on approaches in the future, but keep in mind that there are many ways to solve an exercise and we are only posting an example.  There is always another way and more often than not, a better way.

 

Best Regards,

 

@GeneR

 

 

 

dataMack
12 - Quasar
Thanks @GeneR That's perfect- I was wondering if you specifically made 4 to catch the possible variations only in the source file, or if the statements were designed to work on other variations that may not be present in that source, but possible in another, similar file. Very much appreciate the explanation- thank you!
mix_pix
10 - Fireball

This is the first exercise I've downloaded and looked at so far (we just got our Alteryx licenses).  I initially looked at the RegEx syntax and thought "my Lord, how am I ever going to unpack that so that I could ever hope to build one of those myself", and then I noticed the drop-down menu next to the Regular Expression text box.  That is seriously wonderful...the fact that you can build a RegEx statement by just selecting the components you want in the order you want.  I realize this won't be necessary once I familiarize myself with the syntax more, but having this tool for the novice user is gold.  Alteryx is an amazing tool.  That's all I wanted to say. :-)

 

-Mike

markp201
8 - Asteroid

My perpective was there were variations for only two formats.

 

(\d{1,2}(\s|-)\D{3,}(\s|-)\d{2,4})

 

(\D{3}(\s)\d{1,2}.{1,2}\d{2,4})

 

 

mceleavey
17 - Castor
17 - Castor

My regex skills are lacking, but here goes:

 

Spoiler
I first created a RecordID column, then used regex to parse by every word. I then transposed to create a column with all values and removed null rows:

Initial split and transpose.PNG

I then took two streams, the first was to parse the month. To do this, I used regex to match the three character month code, and filtered out those where it did not exist:

Parse out month strings to record ID.PNG

The second stream was to isolate the day and year figures, which were numeric. I used the data cleansing tool to remove non-numeric characters and removed null rows. I then dropped all fields but the values and the recordID, I then created a title for the crosstab, and pivoted the data:

Parse out day and Year values.PNG

I then simply joined the two streams together on the RecordID field, used a formula to add the date elements together into a date string, then formatted the date:

Join together and format dates.PNG





Bulien

KatieA
Alteryx Alumni (Retired)

Solution attached!