Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Regex Replace - Extract Sub-string from string

PKoya
8 - Asteroid

Hi,

 

I am trying to use Regex replace function to extract simply the file name from the complete output path as below.

 

Input FieldOutput Field
\\mysharedrive.sharedrive.com\Folder1\Folder2\Folder3\Folder4\Folder5\My Report.xlsx|||My ReportMy Report.xlsx

 

This is possible using Text to column and I am able to get the desired result. but I want to reduce the clutter on my workflow and use the Regex Replace method but I could not figure out the syntax.

 

Any help would be appreciated.

5 REPLIES 5
atcodedog05
22 - Nova
22 - Nova

Hi @PKoya 

 

Here is how you can do it.

Workflow:

atcodedog05_0-1646061554964.png

 

Hope this helps : )

 

binuacs
20 - Arcturus

@PKoya 

REGEX_Replace([Input Field], '.*\\(.+)\|\|\|.+', '$1')

binuacs_0-1646061557732.png

 

MarqueeCrew
20 - Arcturus
20 - Arcturus

@PKoya ,

 

Answering with a RegEx_Replace solution as required:

 

REGEX_Replace([Input Field], ".*\\(.*?)\|.*", '$1')

 

Cheers,

 

Mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
MarqueeCrew
20 - Arcturus
20 - Arcturus

@PKoya ,

 

Along with the request to have this solved using a Regular Expression comes the solution that isn't a regular expression.  I commend your quest for knowledge, but all of the solutions provided give you little guidance as to how they work or if they are implemented well.  I use regex101.com and with it comes an explanation:

 

.*\\(.*?)\|.*  gm (greedy match)
 
.
 matches any character (except for line terminators)
 
* matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\\ matches the character \ with index 9210 (5C16 or 1348) literally (case sensitive)
 
1st Capturing Group 
(.*?)
 
.
 matches any character (except for line terminators)
 
*? matches the previous token between zero and unlimited times, as few times as possible, expanding as needed (lazy)
\| matches the character | with index 12410 (7C16 or 1748) literally (case sensitive)
 
.
 matches any character (except for line terminators)
 
* matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
 
Global pattern flags
g modifier: global. All matches (don't return after first match)
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
 
But still I don't solve a problem like this using RegEx unless I'm just quickly trying to get a filename.  RegEx, once you learn it, codes very quickly.  But it is a dog for performance (apologies to racing dogs).  My solution solves the challenge in 63 steps (re: regex101.com) and the others solve in 71 and 64 steps respectively.
 
But if you use this formula:  

 

 

FileGetFileName(Left([Input Field], FindString([Input Field], "|")))+FileGetExt(Left([Input Field], FindString([Input Field], "|")))

 

 

 

you solve it faster!

 

With 1,000,000 records I tested the regex 3 times:

 

Info: Formula (5): Profile Time: 5,518.52ms, 92.03%
Info: Formula (5): Profile Time: 4,590.08ms, 91.99%
Info: Formula (5): Profile Time: 4,174.64ms, 91.45%

 

The string formula tested with:

 

Info: Formula (3): Profile Time: 2,183.11ms, 83.62%
Info: Formula (3): Profile Time: 2,321.22ms, 84.44%
Info: Formula (3): Profile Time: 2,187.39ms, 84.33%

 

RegEx is generally 2-3 times more time consuming.

 

Cheers,

 

Mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
PKoya
8 - Asteroid

Thanks, @MarqueeCrew.

 

I will try and have a play around with it and see how it works.

 

I am pretty good with Python Regex but could not workout this specific one.

Labels