Challenge #4: Date Parsing
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
For the fourth challenge let’s look at parsing Dates from text strings. To view the previous challenge, click HERE.
A dataset contains a text field that has a date embedded within the text. The problem is that the date is represented a few different ways. For example:
- 16-APR-2005
- Nov•16,•1900
- 4-SEP-00
- Jan•5•2000
The goal is to create a new Date/Time field populated with the dates contained within the text field. You will also need to standardize the dates so that they are all formatted the same.
We have listed this as an advanced exercise since parsing out the dates can be challenging depending on the technique you employ to do it. As always, we love to hear your comments. Have fun!
UPDATE 12/7/2015:
The solution has been uploaded
- Labels:
- Advanced
- Data Analysis
- Difficult
- Parse
- Preparation
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
We updated this exercise with the solution. Be sure to check out this week's exercise.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thanks Tara. Would it be possible for the analyst the develops the recommended solution to post a brief write up explaining their choices in solving the problem? There's always multiple ways to approach such a problem and I think it's extremely helpful for learning to understand they 'why' behind a solution.
For example, in the first RegEx tool node, 4 output fields are configured, which then get evaluated into a single output via a sequence in a single formula tool. As I examine the RegEx tool, I wonder why 4 outputs were needed? Does each solve for a particular format? Are they designed to be mutually exclusive, etc.
I think of there was just a brief 1-2 sentence explanation for the particular choices, it would greatly enhance understanding of the solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @dataMack,
Like you said, there is always multiple ways to approach work flows in Alteryx. In the case above there was 4 different formats the dates could be in. Since I'm not an expert with RegEx wanted to keep each in its own column for reformatting downstream (thus the 4). If I was a RegEx master I'm sure there is a single statement to identify all the dates and extract them in a nice clean date-time format.
I will try to provide a bit more explanation on approaches in the future, but keep in mind that there are many ways to solve an exercise and we are only posting an example. There is always another way and more often than not, a better way.
Best Regards,
@GeneR
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
This is the first exercise I've downloaded and looked at so far (we just got our Alteryx licenses). I initially looked at the RegEx syntax and thought "my Lord, how am I ever going to unpack that so that I could ever hope to build one of those myself", and then I noticed the drop-down menu next to the Regular Expression text box. That is seriously wonderful...the fact that you can build a RegEx statement by just selecting the components you want in the order you want. I realize this won't be necessary once I familiarize myself with the syntax more, but having this tool for the novice user is gold. Alteryx is an amazing tool. That's all I wanted to say. :-)
-Mike
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
My perpective was there were variations for only two formats.
(\d{1,2}(\s|-)\D{3,}(\s|-)\d{2,4})
(\D{3}(\s)\d{1,2}.{1,2}\d{2,4})
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
My regex skills are lacking, but here goes:
I then took two streams, the first was to parse the month. To do this, I used regex to match the three character month code, and filtered out those where it did not exist:
The second stream was to isolate the day and year figures, which were numeric. I used the data cleansing tool to remove non-numeric characters and removed null rows. I then dropped all fields but the values and the recordID, I then created a title for the crosstab, and pivoted the data:
I then simply joined the two streams together on the RecordID field, used a formula to add the date elements together into a date string, then formatted the date:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator