Parse SFTP file listing with REGEX
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I am trying to parse an SFTP file listing to columns to extract file size, timestamp, and file name.
Here is a sample of my data and desired output:
DownloadData | Statuses | Size | Timesamp | File/DIR Name |
drwxrwxrwx 1 0 0 0 Nov 6 04:44 . | drwxrwxrwx 1 0 0 | 0 | Nov 6 04:44 | . |
-rw-rw-rw- 1 0 0 50445 Oct 10 12:01 employeejobrates.csv | -rw-rw-rw- 1 0 0 | 50445 | Oct 10 12:01 | employeejobrates.csv |
drwxrwxrwx 1 0 0 0 Nov 5 12:47 HotSchedulesPickup | drwxrwxrwx 1 0 0 | 0 | Nov 5 12:47 | HotSchedulesPickup |
-rw-rw-rw- 1 0 0 184313 Aug 29 10:17 NEU18114_Employee Job Rate Report-untitled#20190822(NEUADMIN)-623-2043.csv | -rw-rw-rw- 1 0 0 | 184313 | Aug 29 10:17 | NEU18114_Employee Job Rate Report-untitled#20190822(NEUADMIN)-623-2043.csv |
-rw-rw-rw- 1 0 0 107904 Nov 2 10:22 NEU18114_Employees#19000101(NEUADMIN)-1009-2246.csv | -rw-rw-rw- 1 0 0 | 107904 | Nov 2 10:22 | NEU18114_Employees#19000101(NEUADMIN)-1009-2246.csv |
-rw-rw-rw- 1 0 0 637 Nov 6 04:38 nightlyfinfile_20191106_0743.csv | -rw-rw-rw- 1 0 0 | 637 | Nov 6 04:38 | nightlyfinfile_20191106_0743.csv |
-rw-rw-rw- 1 0 0 63278 Oct 16 08:24 nightlyjobfile20191016.csv | -rw-rw-rw- 1 0 0 | 63278 | Oct 16 08:24 | nightlyjobfile20191016.csv |
-rw-rw-rw- 1 0 0 162304 May 10 07:25 PAYROLLAUDIT_030719.XLS | -rw-rw-rw- 1 0 0 | 162304 | May 10 07:25 | PAYROLLAUDIT_030719.XLS |
-rw-rw-rw- 1 0 0 127488 Jul 16 10:04 reports - 2019-07-16T120342.516.xls | -rw-rw-rw- 1 0 0 | 127488 | Jul 16 10:04 | reports - 2019-07-16T120342.516.xls |
-rw-rw-rw- 1 0 0 51847 Aug 12 19:47 reports - 2019-08-12T214634.579.xlsx | -rw-rw-rw- 1 0 0 | 51847 | Aug 12 19:47 | reports - 2019-08-12T214634.579.xlsx |
-rw-rw-rw- 1 0 0 185246 Aug 22 07:19 reports - 2019-08-22T091912.260.csv | -rw-rw-rw- 1 0 0 | 185246 | Aug 22 07:19 | reports - 2019-08-22T091912.260.csv |
-rw-rw-rw- 1 0 0 116682 Aug 22 07:23 reports - 2019-08-22T092305.581.csv | -rw-rw-rw- 1 0 0 | 116682 | Aug 22 07:23 | reports - 2019-08-22T092305.581.csv |
-rw-rw-rw- 1 0 0 4942674 Jul 18 13:43 Schedule-2019-07-18.csv | -rw-rw-rw- 1 0 0 | 4942674 | Jul 18 13:43 | Schedule-2019-07-18.csv |
-rw-rw-rw- 1 0 0 3847995 Jul 31 2018 12:35 Schedule-2019-07-31T19-34-51-001.csv | -rw-rw-rw- 1 0 0 | 3847995 | Jul 31 2018 12:35 | Schedule-2019-07-31T19-34-51-001.csv |
I have attempted to use text to columns with a space delimiter as described in the link below. That will work in most situations but not when there is a space in the file name or when the year is different than the current year.
My regex skills are not up to par in order to parse this data to where I need it. Any help would be greatly appreciated.
Thanks,
James Allen
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @jallen201,
Hopefully this is what you were looking for? I also used a multi field formula to remove duplicate spaces and replace with just one, not sure whether you also need that.
My regex function was ^(.*?\s+\d\s+\d\s+\d)\s+(\d+)\s+([A-Z][a-z]{2}\s+\d{1,2}(?:\s+\d{4})?\s+\d{2}:\d{2})\s+(.*)$
If this solves your issue please mark the answer as correct, if not let me know!
Regards,
Jonathan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
This worked perfectly... Now to dissect the regex on my own and learn it so I can engineer my own solution next time.
Thanks,
James Allen
