Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Regex for Extraction

vvissamsetty
8 - Asteroid

Hi Guys,

I have huge lines of data as below and I'm looking to extract the date, the immediate number following the date and the first letter after the number (C/P), into three separate columns. Can anyone suggest on how to go with this?

Thanks

 

 

E 2015-09-20 110.000 C A OPA

AAA 2013-02-15 22.500 P A OPA

BBBB 2017-04-18 52.000 C A OPA BBBBP1C181303300.U

CC 2020-07-19 195.000 C A OPA

DDDD 2011-03-15 70.000 P A OPA

6 REPLIES 6
ChrisTX
15 - Aurora

can you post the sample data as text, instead of a PNG attachment?

vvissamsetty
8 - Asteroid

Hi @ChrisTX , I posted it to my query above.

Thanks

BPurcell2
9 - Comet

.*(\d{4}-\d{2}-\d{2})\s([\d\.]*).*([CP]).*

 

A RegEx explanation since @vvissamsetty requested -- posting to the thread in case others have a similar question:

 

.  -- Any Character

* -- Repeat 0 or many

     Find any character and repeat until next argument in RegEx String

 

\d Any digit

{4} - Exactly 4 repeating digits

{2} - Exactly 2 repeating digits

- means -

\s space

[\d\.*] -- find any digit or . and as many repeating digits or .  -- brackets mean [] any of the inside characters

\. means . since RegEx counts a . as any character you have to escape special character with a preceding /

[CP] -- find either a C or a P

() -- Capturing Group -- this tells the RegEx tool which character groups to extract into a column -- each () group goes into a separate column

 

You'll notice several instances of .* not in () -- you have to replicate the text using RegEx Syntax in order for it to work.  For example, the space between the date and number must be handled -- you can use \s for a space or a (.*)--wildcard   I prefer using a more specific RegEx Identifier if possible like the space to avoid unintended matching.

 

I use https://regexr.com/ to test more complicated situations, and save time of running through a workflow, then paste it into the tool.

 

Happy to help, since I had a similar question answered on the community many years ago which helped me learn RegEx.

 

Thanks.

 

 

 

 

Thableaus
17 - Castor
17 - Castor

Hi, @vvissamsetty 

 

This should work:

 

REGEX_Replace([Field1], "^.*(\d{4}-\d{2}-\d{2}).*$", "$1")

 

Cheers,

NickSm
Alteryx
Alteryx

@vvissamsetty 

 

At first look, a basic text-to-columns might be the easiest way to do this just using space as a delimiter, but could also RegEx it.

 

Made a quick sample that does both ways.

 

 

vvissamsetty
8 - Asteroid

Thanks guys! It worked.

Labels