hi powerful regex group,
i would like to parse out the below data into 5 columns, & is the common delimiter however for column name and nature, there will be & as part of the data.
original data
Raw |
HHK SERVICES PTE LTD&SINGAPORE&BZ CONSULTANCY & MGMNT SERVICES&11-Jul-2011&24 |
STRATEGIC FINANCIAL SERVICES PTE LTD&SINGAPORE&CONDUCT TRAINING & SEMINAR FOR OWN FINANCIAL CONSULTANTS&21-Oct-2011&25 |
TIYA PTE. LTD&Singapore&Consultancy&02-May-2019&100 |
Kang Yu Biz&Singapore&Human Resource&28-Aug-2019&100 |
ABC & Real Corporation Pte Ltd&Singapore&Marketing of Financial Services&01-Oct-2009&33 |
C&D Organisation Pte Ltd&Singapore&Event&26-Dec-2019&100 |
K&P PTE.LTD&35 Sennett Lane S466927&NA&20-Apr-2016&50 |
The ideal output,
Name | Location | Nature | Date | Amount |
HHK SERVICES PTE LTD | SINGAPORE | BZ CONSULTANCY & MGMNT SERVICES | 11-Jul-2011 | 24 |
STRATEGIC FINANCIAL SERVICES PTE LTD | SINGAPORE | CONDUCT TRAINING & SEMINAR FOR OWN FINANCIAL CONSULTANTS | 21-Oct-2011 | 25 |
TIYA PTE. LTD | Singapore | Consultancy | 02-May-2019 | 100 |
Kang Yu Biz | Singapore | Human Resource | 28-Aug-2019 | 100 |
ABC & Real Corporation Pte Ltd | Singapore | Marketing of Financial Services | 01-Oct-2009 | 33 |
C&D Organisation Pte Ltd | Singapore | Event | 26-Dec-2019 | 100 |
K&P PTE.LTD | 35 Sennett Lane S466927 | NA | 20-Apr-2016 | 50 |
Please try this using Regex tool
^(.*?)&([A-Za-z ]+)&(.+?)&(\d{2}-[A-Za-z]{3}-\d{4})&(\d+)$
I have made few changes to the code
Please try this
^(.*?)&([^&]+)&([^&]+?)&(\d{2}-[A-Za-z]{3}-\d{2})(\d{2})&(\d+)$
hi, thanks for your solution. i think the 1st row record is not correct. the location column should be Singapore
I think you should use 2 regex tool for your expected output
do you have a solution? i have tried different ways...it doesn't work. the last two columns are easy, but there are & in the name and location column and & is part of the data. i couldn't find a pattern...
Hi @Haokun
One way of doing this using Regex tool.
([a-zA-z &.]+)&(\u+|\d{2}\s.+)&([a-zA-z &]+)&(\d{2}.+)&(\d+)
If this helps, please like this post and mark it as a solution. If you have any other questions, please let us know.
Many thanks
Shanker V
@shanker, it works for the sample set, there are 700 rows of records, some are not matched.. let me have a look at it..there are some special characters in the nature column, such as () / - \
Hi @Haokun
Sure, check and let me know which are not matched.
Might be slight tweaks needed to get those matched.
The regex has been created to match the sample dataset shared.
maybe this?
^(.*LTD.*?|.*?)&(.*?)&(.*)&(.*)&(.*)$
as "&" only generally incurred when come to LTD, we can use that as anchor.
for 1st column, if have LTD, then end with "LTD" (to include "&"), else, end before "&"
edit: turn off greedy ^(.*LTD|.*?)&(.*?)&(.*)&(.*)&(.*)$
edit: check if anything behind Ltd ^(.*LTD.*?|.*?)&(.*?)&(.*)&(.*)&(.*)$
for 2nd column add "?" to turn off greedy.
however, best to change the delimiters if possible.