Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Regex for Extraction

vvissamsetty
8 - Asteroid

Hi Guys,

I have huge lines of data as below and I'm looking to extract the date, the immediate number following the date and the first letter after the number (C/P), into three separate columns. Can anyone suggest on how to go with this?

Thanks

 

 

E 2015-09-20 110.000 C A OPA

AAA 2013-02-15 22.500 P A OPA

BBBB 2017-04-18 52.000 C A OPA BBBBP1C181303300.U

CC 2020-07-19 195.000 C A OPA

DDDD 2011-03-15 70.000 P A OPA

6 REPLIES 6
ChrisTX
15 - Aurora

can you post the sample data as text, instead of a PNG attachment?

vvissamsetty
8 - Asteroid

Hi @ChrisTX , I posted it to my query above.

Thanks

BPurcell2
9 - Comet

.*(\d{4}-\d{2}-\d{2})\s([\d\.]*).*([CP]).*

 

A RegEx explanation since @vvissamsetty requested -- posting to the thread in case others have a similar question:

 

.  -- Any Character

* -- Repeat 0 or many

     Find any character and repeat until next argument in RegEx String

 

\d Any digit

{4} - Exactly 4 repeating digits

{2} - Exactly 2 repeating digits

- means -

\s space

[\d\.*] -- find any digit or . and as many repeating digits or .  -- brackets mean [] any of the inside characters

\. means . since RegEx counts a . as any character you have to escape special character with a preceding /

[CP] -- find either a C or a P

() -- Capturing Group -- this tells the RegEx tool which character groups to extract into a column -- each () group goes into a separate column

 

You'll notice several instances of .* not in () -- you have to replicate the text using RegEx Syntax in order for it to work.  For example, the space between the date and number must be handled -- you can use \s for a space or a (.*)--wildcard   I prefer using a more specific RegEx Identifier if possible like the space to avoid unintended matching.

 

I use https://regexr.com/ to test more complicated situations, and save time of running through a workflow, then paste it into the tool.

 

Happy to help, since I had a similar question answered on the community many years ago which helped me learn RegEx.

 

Thanks.

 

 

 

 

Thableaus
17 - Castor
17 - Castor

Hi, @vvissamsetty 

 

This should work:

 

REGEX_Replace([Field1], "^.*(\d{4}-\d{2}-\d{2}).*$", "$1")

 

Cheers,

NickSm
Alteryx
Alteryx

@vvissamsetty 

 

At first look, a basic text-to-columns might be the easiest way to do this just using space as a delimiter, but could also RegEx it.

 

Made a quick sample that does both ways.

 

 

vvissamsetty
8 - Asteroid

Thanks guys! It worked.

Labels