Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Regex

Haokun
8 - Asteroid

hi powerful regex group,

 

i would like to parse out the below data into 5 columns, & is the common delimiter however for column name and nature, there will be & as part of the data.

 

original data

Raw
HHK SERVICES PTE LTD&SINGAPORE&BZ CONSULTANCY & MGMNT SERVICES&11-Jul-2011&24
STRATEGIC FINANCIAL SERVICES PTE LTD&SINGAPORE&CONDUCT TRAINING & SEMINAR FOR OWN FINANCIAL CONSULTANTS&21-Oct-2011&25
TIYA PTE. LTD&Singapore&Consultancy&02-May-2019&100
Kang Yu Biz&Singapore&Human Resource&28-Aug-2019&100
ABC & Real Corporation Pte Ltd&Singapore&Marketing of Financial Services&01-Oct-2009&33
C&D Organisation Pte Ltd&Singapore&Event&26-Dec-2019&100
K&P PTE.LTD&35 Sennett Lane S466927&NA&20-Apr-2016&50

 

The ideal output,

NameLocationNatureDateAmount
HHK SERVICES PTE LTDSINGAPOREBZ CONSULTANCY & MGMNT SERVICES11-Jul-201124
STRATEGIC FINANCIAL SERVICES PTE LTDSINGAPORECONDUCT TRAINING & SEMINAR FOR OWN FINANCIAL CONSULTANTS21-Oct-201125
TIYA PTE. LTDSingaporeConsultancy02-May-2019100
Kang Yu BizSingaporeHuman Resource28-Aug-2019100
ABC & Real Corporation Pte LtdSingaporeMarketing of Financial Services01-Oct-200933
C&D Organisation Pte LtdSingaporeEvent26-Dec-2019100
K&P PTE.LTD35 Sennett Lane S466927NA20-Apr-201650

 

12 REPLIES 12
BRRLL99
11 - Bolide

Please try this using Regex tool

 

^(.*?)&([A-Za-z ]+)&(.+?)&(\d{2}-[A-Za-z]{3}-\d{4})&(\d+)$

BRRLL99
11 - Bolide

I have made few changes to the code 

Please try this

 

^(.*?)&([^&]+)&([^&]+?)&(\d{2}-[A-Za-z]{3}-\d{2})(\d{2})&(\d+)$

 

BRRLL99_0-1677559291564.png

 

Haokun
8 - Asteroid

hi, thanks for your solution. i think the 1st row record is not correct. the location column should be Singapore

BRRLL99
11 - Bolide

I think you should use 2 regex tool for your expected output

Haokun
8 - Asteroid

do you have a solution? i have tried different ways...it doesn't work. the last two columns are easy, but there are & in the name and location column and & is part of the data. i couldn't find a pattern...

ShankerV
17 - Castor

Hi @Haokun 

 

One way of doing this using Regex tool.

 

([a-zA-z &.]+)&(\u+|\d{2}\s.+)&([a-zA-z &]+)&(\d{2}.+)&(\d+)

 

ShankerV_0-1677565951936.png

ShankerV_0-1677566038877.png

 

If this helps, please like this post and mark it as a solution. If you have any other questions, please let us know.

 

Many thanks

Shanker V

Haokun
8 - Asteroid

@shanker, it works for the sample set, there are 700 rows of records, some are not matched.. let me have a look at it..there are some special characters in the nature column, such as () / - \ 

ShankerV
17 - Castor

Hi @Haokun 

 

Sure, check and let me know which are not matched.

Might be slight tweaks needed to get those matched.

 

The regex has been created to match the sample dataset shared.

PangHC
12 - Quasar

maybe this?

^(.*LTD.*?|.*?)&(.*?)&(.*)&(.*)&(.*)$

 

 

as "&" only generally incurred when come to LTD, we can use that as anchor.  

for 1st column, if have LTD, then end with "LTD" (to include "&"), else, end before "&"

edit: turn off greedy                                 ^(.*LTD|.*?)&(.*?)&(.*)&(.*)&(.*)$

edit: check if anything behind Ltd     ^(.*LTD.*?|.*?)&(.*?)&(.*)&(.*)&(.*)$

 

for 2nd column add "?" to turn off greedy.

however, best to change the delimiters if possible.  

 

Labels