Alteryx Designer Desktop Discussions

FECL3 · ‎01-31-2023

Hello

I have a set of data (see sample below). I want to parse the data so that there are 6 columns. the 5th and 6th column are either a number or whitespace depending on the position of digit in the string.

1000 All Goodwill ASSETS 20500000
1050 All Amortisation Goodwill ASSETS 20500000

I have the following expression, however I can't seem to add a capturing group in the 5th group to capture either the digit or additional white space, and then the 6th group capturing the digit or nothing.

(\d{4,6})\s*(All)\s(.*)(ASSETS|LIABILITY|OWNERS EQ|INCOME|EXPENSE|BROKER)\s*(\d*)

Any help would be much appreciated

Thanks

DavidSkaife · ‎01-31-2023

Hi @FECL3

Which rows have the numerical data that should sit in field 6?

EDIT: Other than you knowing the data sits in field 6, is there any other way of knowing from the data? If there isn't it's going to be nigh on impossible to translate that into logic..

FECL3 · ‎01-31-2023

They will sit in rows 2,3,4,6 and 7

It's difficult as ASSETS, EXPENSE will generally be in the 5th capturing group, LIABILITIES, OWNERSEQ, and INCOME will be in the 6th capturing group. However this is not always the case and as sometimes they aren't (for instance row 2 and row 5), hence the desire to try capture them based on their position relative to the amount of whitespace in the data.

DavidSkaife · ‎01-31-2023

Hi @FECL3

This is horrible code, but, try this:

(\d{4,6})\s*(All)\s(.*)(ASSETS|LIABILITY|OWNERSEQ|INCOME|EXPENSE|BROKER)\s{23,27}(\d{1,3},\d{1,3}.\d{2}|\d{1,3}.\d{2}|\d{1,3},\d{1,3},\d{1,3}.\d{2}|\s*)\s*(\d{1,3},\d{1,3}.\d{2}|\d{1,3}.\d{2}|\d{1,3},\d{1,3},\d{1,3}.\d{2}|\s*)

It seems to work on the sample you sent, but i've had to make some assuptions based on the number of whitespace you have after the ASSETS/LIABILITIES group.

You can define an upper and lower limits in Regex, to account for variations in whitespace. So what i did was find the number for each row using

REGEX_CountMatches([Field1], '\s')

which returns this:

I plug this into the Regex code you provided and replace the \s+ with \s{23,27} just before the numerical regex code to account for the min/max value before the first set of numerical data (these upper/lower values may need to be changed depending on your whole dataset). Subsequently i repeat the same batch of Regex to find the 6th field but this time i leave the separator as \s*

I also added in |\s* at the end of the numerical code as both field 5 or field 6 can be whitespace.

(\d{4,6})
\s*(All)
\s(.*)
(ASSETS|LIABILITY|OWNERSEQ|INCOME|EXPENSE|BROKER)
\s{23,27}(\d{1,3},\d{1,3}.\d{2}|\d{1,3}.\d{2}|\d{1,3},\d{1,3},\d{1,3}.\d{2}|\s*)
\s*(\d{1,3},\d{1,3}.\d{2}|\d{1,3}.\d{2}|\d{1,3},\d{1,3},\d{1,3}.\d{2}|\s*)

There is probably a way of making this a bit tidier...but you do appear to get the right output:

FECL3 · ‎01-31-2023

Thank you very much - a neat and useful solution. Thank you for the explanation too - I have had to edit your code slightly to fit the different permeations across the whole dataset - but it has worked well!

DavidSkaife · ‎01-31-2023

Hi @FECL3

I'm glad it works, and i learnt something new about the upper/lower parameters so it's a win all round :)

Alteryx Designer Desktop Discussions

Parse white space