In case you missed the announcement: Alteryx One is here, and so is the Spring Release! Learn more about these new and exciting releases here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Regex Replace - Extract Sub-string from string

PKoya
8 - Asteroid

Hi,

 

I am trying to use Regex replace function to extract simply the file name from the complete output path as below.

 

Input FieldOutput Field
\\mysharedrive.sharedrive.com\Folder1\Folder2\Folder3\Folder4\Folder5\My Report.xlsx|||My ReportMy Report.xlsx

 

This is possible using Text to column and I am able to get the desired result. but I want to reduce the clutter on my workflow and use the Regex Replace method but I could not figure out the syntax.

 

Any help would be appreciated.

5 REPLIES 5
atcodedog05
22 - Nova
22 - Nova

Hi @PKoya 

 

Here is how you can do it.

Workflow:

atcodedog05_0-1646061554964.png

 

Hope this helps : )

 

binuacs
21 - Polaris

@PKoya 

REGEX_Replace([Input Field], '.*\\(.+)\|\|\|.+', '$1')

binuacs_0-1646061557732.png

 

MarqueeCrew
20 - Arcturus
20 - Arcturus

@PKoya ,

 

Answering with a RegEx_Replace solution as required:

 

REGEX_Replace([Input Field], ".*\\(.*?)\|.*", '$1')

 

Cheers,

 

Mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
MarqueeCrew
20 - Arcturus
20 - Arcturus

@PKoya ,

 

Along with the request to have this solved using a Regular Expression comes the solution that isn't a regular expression.  I commend your quest for knowledge, but all of the solutions provided give you little guidance as to how they work or if they are implemented well.  I use regex101.com and with it comes an explanation:

 

.*\\(.*?)\|.*  gm (greedy match)
 
.
 matches any character (except for line terminators)
 
* matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\\ matches the character \ with index 9210 (5C16 or 1348) literally (case sensitive)
 
1st Capturing Group 
(.*?)
 
.
 matches any character (except for line terminators)
 
*? matches the previous token between zero and unlimited times, as few times as possible, expanding as needed (lazy)
\| matches the character | with index 12410 (7C16 or 1748) literally (case sensitive)
 
.
 matches any character (except for line terminators)
 
* matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
 
Global pattern flags
g modifier: global. All matches (don't return after first match)
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
 
But still I don't solve a problem like this using RegEx unless I'm just quickly trying to get a filename.  RegEx, once you learn it, codes very quickly.  But it is a dog for performance (apologies to racing dogs).  My solution solves the challenge in 63 steps (re: regex101.com) and the others solve in 71 and 64 steps respectively.
 
But if you use this formula:  

 

 

FileGetFileName(Left([Input Field], FindString([Input Field], "|")))+FileGetExt(Left([Input Field], FindString([Input Field], "|")))

 

 

 

you solve it faster!

 

With 1,000,000 records I tested the regex 3 times:

 

Info: Formula (5): Profile Time: 5,518.52ms, 92.03%
Info: Formula (5): Profile Time: 4,590.08ms, 91.99%
Info: Formula (5): Profile Time: 4,174.64ms, 91.45%

 

The string formula tested with:

 

Info: Formula (3): Profile Time: 2,183.11ms, 83.62%
Info: Formula (3): Profile Time: 2,321.22ms, 84.44%
Info: Formula (3): Profile Time: 2,187.39ms, 84.33%

 

RegEx is generally 2-3 times more time consuming.

 

Cheers,

 

Mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
PKoya
8 - Asteroid

Thanks, @MarqueeCrew.

 

I will try and have a play around with it and see how it works.

 

I am pretty good with Python Regex but could not workout this specific one.

Labels
Top Solution Authors