Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Parse white space

FECL3
8 - Asteroid

Hello

 

I have a set of data (see sample below). I want to parse the data so that there are 6 columns. the 5th and 6th column are either a number or whitespace depending on the position of digit in the string.

 

1000 All Goodwill ASSETS 20500000 
1050 All Amortisation Goodwill ASSETS                          20500000 

 

I have the following expression, however I can't seem to add a capturing group in the 5th group to capture either the digit or additional white space, and then the 6th group capturing the digit or nothing.

 

(\d{4,6})\s*(All)\s(.*)(ASSETS|LIABILITY|OWNERS EQ|INCOME|EXPENSE|BROKER)\s*(\d*)

 

Any help would be much appreciated 

 

 

Thanks

14 REPLIES 14
DavidSkaife
13 - Pulsar

Hi @FECL3 

 

Which rows have the numerical data that should sit in field 6?

 

EDIT: Other than you knowing the data sits in field 6, is there any other way of knowing from the data? If there isn't it's going to be nigh on impossible to translate that into logic..

FECL3
8 - Asteroid

They will sit in rows 2,3,4,6 and 7

 

It's difficult as ASSETS, EXPENSE will generally be in the 5th capturing group, LIABILITIES, OWNERSEQ, and INCOME will be in the 6th capturing group. However this is not always the case and as sometimes they aren't (for instance row 2 and row 5), hence the desire to try capture them based on their position relative to the amount of whitespace in the data.  

DavidSkaife
13 - Pulsar

Hi @FECL3 

 

This is horrible code, but, try this:

 

 

 

(\d{4,6})\s*(All)\s(.*)(ASSETS|LIABILITY|OWNERSEQ|INCOME|EXPENSE|BROKER)\s{23,27}(\d{1,3},\d{1,3}.\d{2}|\d{1,3}.\d{2}|\d{1,3},\d{1,3},\d{1,3}.\d{2}|\s*)\s*(\d{1,3},\d{1,3}.\d{2}|\d{1,3}.\d{2}|\d{1,3},\d{1,3},\d{1,3}.\d{2}|\s*)

 

 

 

It seems to work on the sample you sent, but i've had to make some assuptions based on the number of whitespace you have after the ASSETS/LIABILITIES group.

 

You can define an upper and lower limits in Regex, to account for variations in whitespace. So what i did was find the number for each row using

 

 

REGEX_CountMatches([Field1], '\s')

 

 

 which returns this:

DavidSkaife_0-1675177277678.png

 

I plug this into the Regex code you provided and replace the \s+ with \s{23,27} just before the numerical regex code to account for the min/max value before the first set of numerical data (these upper/lower values may need to be changed depending on your whole dataset). Subsequently i repeat the same batch of Regex to find the 6th field but this time i leave the separator as \s*

I also added in |\s* at the end of the numerical code as both field 5 or field 6 can be whitespace.

 

 

(\d{4,6})
\s*(All)
\s(.*)
(ASSETS|LIABILITY|OWNERSEQ|INCOME|EXPENSE|BROKER)
\s{23,27}(\d{1,3},\d{1,3}.\d{2}|\d{1,3}.\d{2}|\d{1,3},\d{1,3},\d{1,3}.\d{2}|\s*)
\s*(\d{1,3},\d{1,3}.\d{2}|\d{1,3}.\d{2}|\d{1,3},\d{1,3},\d{1,3}.\d{2}|\s*)

 

 

There is probably a way of making this a bit tidier...but you do appear to get the right output:

 

DavidSkaife_1-1675177584858.png

 

 

FECL3
8 - Asteroid

Thank you very much - a neat and useful solution. Thank you for the explanation too - I have had to edit your code slightly to fit the different permeations across the whole dataset - but it has worked well!

DavidSkaife
13 - Pulsar

Hi @FECL3 

 

I'm glad it works, and i learnt something new about the upper/lower parameters so it's a win all round :)

Labels