Alteryx Designer Desktop Discussions

PKoya · ‎02-28-2022

Hi,

I am trying to use Regex replace function to extract simply the file name from the complete output path as below.

Input Field	Output Field
\\mysharedrive.sharedrive.com\Folder1\Folder2\Folder3\Folder4\Folder5\My Report.xlsx\|\|\|My Report	My Report.xlsx

This is possible using Text to column and I am able to get the desired result. but I want to reduce the clutter on my workflow and use the Regex Replace method but I could not figure out the syntax.

Any help would be appreciated.

atcodedog05 · ‎02-28-2022

Hi @PKoya

Here is how you can do it.

Workflow:

Hope this helps : )

binuacs · ‎02-28-2022

@PKoya

REGEX_Replace([Input Field], '.*\\(.+)\|\|\|.+', '$1')

MarqueeCrew · ‎02-28-2022

@PKoya ,

Answering with a RegEx_Replace solution as required:

REGEX_Replace([Input Field], ".*\\(.*?)\|.*", '$1')

Cheers,

Mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.

MarqueeCrew · ‎02-28-2022

@PKoya ,

Along with the request to have this solved using a Regular Expression comes the solution that isn't a regular expression. I commend your quest for knowledge, but all of the solutions provided give you little guidance as to how they work or if they are implemented well. I use regex101.com and with it comes an explanation:

.*\\(.*?)\|.* gm (greedy match)

.

matches any character (except for line terminators)

* matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)

\\ matches the character \ with index 9210 (5C16 or 1348) literally (case sensitive)

1st Capturing Group

(.*?)

.

matches any character (except for line terminators)

*? matches the previous token between zero and unlimited times, as few times as possible, expanding as needed (lazy)

\| matches the character | with index 12410 (7C16 or 1748) literally (case sensitive)

.

matches any character (except for line terminators)

* matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)

Global pattern flags

g modifier: global. All matches (don't return after first match)

m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)

But still I don't solve a problem like this using RegEx unless I'm just quickly trying to get a filename. RegEx, once you learn it, codes very quickly. But it is a dog for performance (apologies to racing dogs). My solution solves the challenge in 63 steps (re: regex101.com) and the others solve in 71 and 64 steps respectively.

But if you use this formula:

FileGetFileName(Left([Input Field], FindString([Input Field], "|")))+FileGetExt(Left([Input Field], FindString([Input Field], "|")))

you solve it faster!

With 1,000,000 records I tested the regex 3 times:

Info: Formula (5): Profile Time: 5,518.52ms, 92.03%
Info: Formula (5): Profile Time: 4,590.08ms, 91.99%
Info: Formula (5): Profile Time: 4,174.64ms, 91.45%

The string formula tested with:

Info: Formula (3): Profile Time: 2,183.11ms, 83.62%
Info: Formula (3): Profile Time: 2,321.22ms, 84.44%
Info: Formula (3): Profile Time: 2,187.39ms, 84.33%

RegEx is generally 2-3 times more time consuming.

Cheers,

Mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.

PKoya · ‎02-28-2022

Thanks, @MarqueeCrew.

I will try and have a play around with it and see how it works.

I am pretty good with Python Regex but could not workout this specific one.

Alteryx Designer Desktop Discussions

Regex Replace - Extract Sub-string from string