RegEx - Parsing Date from File Names
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi All,
I am relatively new to using Regular Expressions, especially within Alteryx. I am trying to pull out the date from the file name.
For more context the file names themselves always have the date in one manor or another. There are several different ways that the files are named given that I am dealing with several thousand files across multiple buildings and different people that created them. The file names contain additional information sometimes, so I want to make sure to only parse the date portion. I got this formula to work for all of the different date formats that I have seen used on my files, I am not able to get this to work correctly when I try adding it to the RegEx tool in Alteryx.
I am trying to create a new string field that contains only the date portion of the file name. I will then be able to use that later in the workflow and convert from a strong to a date field. Any advice on where I am going wrong would be greatly appreciated. Thanks.
(\d{1,4})(\.|-)(\d{1,2})(\.|-*)(\d{0,4})
Example Date Formats:
01.06.2020
01.06.20
1.6.2020
1.6.20
01-06-2020
01-06-20
1-6-20
01.06
1.6
01-06
1-6
Solved! Go to Solution.
- Labels:
- Regex
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
If you change your expression to this
(\d{1,4})[\.|-](\d{1,2})[\.|-]*(\d{0,4})
you should get all the pieces parsed. I removed the capture groups from the separators and made the second one optional. The attached workflow uses a Regex Parse tool to split the dates into 3 fields
Dan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Your RegEx statement works, but there are couple additional steps needed to make it work for Alteryx. I'm assuming you had the RegEx tool configured to Tokenize the Output Method, and received an error about "ParseSimple mode can only have 0 or 1 Marked sections". Your RegEx statement has several different configurations it is trying to capture, and Tokenize cannot handle multiples in Alteryx.
Solution:
1. Configure the RegEx Tool to Parse for the Output Method.
2. We now have 5 columns broken out. RegExOut1, 3, and 5 contain our Month, Day, and Year respectively. But since you have cases where only month and day exist, we need to account to that before converting them to a Date. So I'm using a Multi-Row Formula tool to fill in the missing years.
3. We can now add a formula tool in to concatenate the fields together to give us a date stored as a string.
4. And finally, I'm using a tool called BB Date to dynamically convert all the string dates to an actual Alteryx date and then you can use a Select tool to deselect all the RegExOut columns.
The BB Date tool is available for download on the Alteryx Community Gallery: https://gallery.alteryx.com/#!app/BB-Date/577fe8aba248970840e6dd4d
I've attached an example workflow for you, but you will need to install the BB Date tool prior to opening for it to work.
If this solves your issue please mark the answer as correct, if not let me know!
Thanks!
Phil
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @Maskell_Rascal & @danilang
Thanks for getting back to me. That helped me understand everything a lot better. I ended up incorporating parts from both of your solutions. @danilang I didn't end up using the macro for date format as this is being used inside of an already complex batch macro. I just padded each of the date attributes so that the combined date string will always be the same format before passing to the DateTime tool .
I went ahead and modified the workflow that @Maskell_Rascal provided in case this solution is of any help to anyone else. Thanks again for the assistance!