I have a column called filename and the field is formatted as
name1_name2_name3_name4_name5_YYYYMMDDHHMMSS.txt.pgp
name1_name2_name3_name4_YYYYMMDDHHMMSS.txt.pgp
name1_name2_name3_name4_name5_name6_YYYYMMDDHHMMSS.txt.pgp
up until now, the number of names can differ, but the filename always ends with YYYYMMDDHHMMSS.txt.pgp
I've been using the regex function (.+)_ to extract everything before the last _YYYYMMDDHHMMSS.txt.pgp resulting in the names only
but now I have a new field in the column formatted as
name1_name2_name3_name4_ddmmyyyy_HHMMSS.txt.pgp
Using my old regex this results in name1_name2_name3_name4_ddmmyyyy, is there a different regex function I can use that's more dynamic that will work for either of those naming conventions?
in my real data the dates are listed as numbers within the filenames, but for my configuration file its just letters.
Example of the data in my configuration file
name1_name2_name3_name4_name5_YYYYMMDDHHMMSS.txt.pgp
name1_name2_name3_name4_YYYYMMDDHHMMSS.txt.pgp
name1_name2_name3_name4_ddmmyyyy_HHMMSS.txt.pgp
Example of the data in my real data
name1_name2_name3_name4_name5_20240401120000.txt.pgp
name1_name2_name3_name4_20240401120000.txt.pgp
name1_name2_name3_name4_20240401_120000.txt.pgp
I'm trying to create 2 separate regex functions, one for my real data and one for my configuration data that only extracts the filename without the date, if there's one regex function that will work for both, that's fine too, I just don't know if that's possible. Does anyone know how I can accomplish this?
Solved! Go to Solution.
You could try,
(.+?)_\d{6}
...for the real data.
And,
(.+?)_[dmy]{6}
... for the configuration file data
And,
(.+?)_(?:\d{6}|[dmy]{6})
...or (with a regex replace with replace field blank),
_(\d|[ymd]){6}.*
... for both.
Solution attached.
Thank you, that worked! just to make sure I'm understanding behind the scenes, what is that code regex - replace code referencing? as in, how does it know what to replace with blank? I'm trying to understand the logic / process of it all but I'm not that familiar with the syntax.
The `_(\d){6}.*` matches an underscore followed by six digits, and the .* just matches the rest. So it will only replace the _dddddd...and so on stuff. I don't think I'm being very clear. ¯\_(ツ)_/¯