This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Hello all,
I am trying to do some file name validations and I need some help in parsing out just the filename and the date.
I essentially need the output to be as follows:
File Name | Date |
AL | 01312017 |
BAR | 01312017 |
BR | 01312017 |
BRT | 01312017 |
HDR | 01312017 |
Test | 01312017 |
TL | 01312017 |
TT | 01312017 |
Text-to-Column would typically work using "_" as the delimiter, but this can cause issues if the user inputs the file name wrongfully (i.e. misses an "_" or inputs "__").
Any assistance would be appreciated!
Thanks
Solved! Go to Solution.
I've used a regex here, let me know if it helps, or if you need some clarification of what's happening:
Wow you are a life saver. If you don't mind would it be possible to breakdown the RegEx and provide explanation?
\d\_*([A-Z]+)\_*(\d+)\_*.*
If it's too much work, no worries.
Thank you again, much appreciated.
No problem!
"\d\_*" says: look for a digit, followed by any number of underscores. The "\" before the underscore tells the RegEx to look for the literal character "_". This means that it would match: "6_", "5___", "1"
The round brackets are what's called a Capturing Group, meaning, parse this bit out into a new column.
"([A-Z]+)\_*" says: One or more capital letters, followed by any number of underscores. Notice that the \_ is outside of the capturing group, so that we do not keep this in our parsed column.
"(\d+)\_*.*" says: One or more digits, followed by any number of underscores, followed by any other characters.
Hopefully this was clear, let me know if there's anything you want me to expand on.
What if in cases where I wanted to split into 3 columns instead?
Firm Name | File Name | Date |
Alberta_Counsel_15175 | AL | 01312017 |
Alberta_Counsel_15175 | BAR | 01312017 |
Alberta_Counsel_15175 | BR | 01312017 |
Alberta_Counsel_15175 | BRT | 01312017 |
Alberta_Counsel_15175 | HDR | 01312017 |
Alberta_Counsel_15175 | Test | 01312017 |
Alberta_Counsel_15175 | TL | 01312017 |
Alberta_Counsel_15175 | TT | 01312017 |
In that case, you'd use:
"(.*\d)\_*([A-Z]+)\_*(\d+)\_*.*"
The difference is that I've added an additional capturing group (), with .* before the \d, meaning that it will catch any number of any characters, ending with a digit.
Is this what you were looking for?