This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
I've used the download tool to pull down the names of files present on our sftp site. The results are the files names are in a single cell. I'm trying to use RegEx to tokenize this data, but am running into a problem when I get to the file name itself which as you could imagine can vary greatly.
I have a situation where I'm trying to Tokenize data pulled down using the Download tool.
The results of the download ftp site being placed in a single cell. Example below.
-rw-rw---- 1 no-user no-group 4425848 Dec 13 2018 ABC 123.txt
-rw-rw---- 1 no-user no-group 4425848 Dec 13 2018 123 ABC.txt
The RegEx flow I've put together thus far is below.
-[[:alpha:]]{2}-[[:alpha:]]{2}----\s{3}\d\s[[:alpha:]]{2}-[[:alpha:]]{4}\s{2}[[:alpha:]]{2}-[[:alpha:]]{5}\s+\d+\s[[:alpha:]]{3}\s\d+\s{2}\d{4}\s
Resulting in the output shown below
At this point my efforts fall apart as the file names can be different. If I add a .+ to the end it doesn't recognize the line break in the cell resulting in this.
How can I model the end of my RegEx flow to account for the varying file names so that it also results in said file names being on their own rows in the output?
Hi @G_WOLAK
Use Regex Tool in PARSE mode with this expression:
.*\d{4}\s+(.*?\.\w+)\s*.*
This will get everything after the four digit year until the (.file extension) mark
Cheers,
Thank you.
I didn't know how to define the criteria of a period which I see is done by the \.
Thanks for also simplifying the expression.