Alteryx Designer Desktop Discussions

FECL3 · ‎01-31-2023

Hello

I have a set of data (see sample below). I want to parse the data so that there are 6 columns. the 5th and 6th column are either a number or whitespace depending on the position of digit in the string.

1000 All Goodwill ASSETS 20500000
1050 All Amortisation Goodwill ASSETS 20500000

I have the following expression, however I can't seem to add a capturing group in the 5th group to capture either the digit or additional white space, and then the 6th group capturing the digit or nothing.

(\d{4,6})\s*(All)\s(.*)(ASSETS|LIABILITY|OWNERS EQ|INCOME|EXPENSE|BROKER)\s*(\d*)

Any help would be much appreciated

Thanks

ShankerV · ‎01-31-2023

Hi @FECL3

Please check are you expecting the result like this.

Many thanks

Shanker V

binuacs · ‎01-31-2023

@FECL3 One way of doing this

Raj · ‎01-31-2023

i think this is not what he is looking for

as much i understood he is looking for this

FECL3 · ‎01-31-2023

Hi - thanks for your response

As mentioned I need to parse the data and not use text to columns. The data above is only example and I can't use a space as a delimiter.

The result should be as follows (either Field 5 or Field 6 being a digit or Null (or empty / or otherwise))

DavidSkaife · ‎01-31-2023

Hi @FECL3

Here you go, try this Regex:

(\d{4,6})\s*(All)\s(.*)(ASSETS|LIABILITY|OWNERS EQ|INCOME|EXPENSE|BROKER)\s(\d*|\s*)(\s+\d*|\s*)

I've stuck a Data Cleansing tool on to remove the duplicate whitespace and you get this:

FECL3 · ‎01-31-2023

Thanks @DavidSkaife - very useful however I fear I have oversimplified the data. When I add this to my workflow it doesn't work because there is more than one space character (\s) between the 5th and 6th capturing group.

The data actually looks like this and as you can see there is multiple white space characters between "Assets" and the value. However the number of whitespace characters differ (this number also differs across the rest of my data set so can't define a value) depending on whether I want it in the 5th or 6th capturing group.

To hopefully add some more context I have added a snip from Regex101. The yellow highlighted section should come through as the 5th capturing group, whilst the proceeding number should be in the 6th capturing group.

Raj · ‎01-31-2023

can you share 5 line of un simplified data , so we can try new logic for same

DavidSkaife · ‎01-31-2023

Hi @FECL3

Ok try this:

(\d{4,6})\s*(All)\s(.*)(ASSETS|LIABILITY|OWNERS EQ|INCOME|EXPENSE|BROKER)\s(\d*|\s+)\s(\s+\d*|\s*)

However i suspect that if there is no structure as to the number of whitespace after ASSETS it's going to be tricky to find a Regex that works in one go.

For the 5th group, will it always be 8 characters/8 whitespace?

How do you know if the final value is in the 5th or 6th based on the structure?

As @Raj said sharing a subset of your data may be helpful for us to troubleshoot for you

FECL3 · ‎01-31-2023

Thank you for the suggestions. Please see example data below in the attached workflow.

Please note I have changed the regex to as the numbers have commas and full stops in.

(\d{4,6})\s*(All)\s(.*)(ASSETS|LIABILITY|OWNERSEQ|INCOME|EXPENSE|BROKER)\s*(\d{1,3},\d{1,3}.\d{2}|\d{1,3}.\d{2}|\d{1,3},\d{1,3},\d{1,3}.\d{2})

The 5th group will have a range of 3 to 11 characters in.

I know the final value is in the 6th structure because it sits within a different column in the source data.

Alteryx Designer Desktop Discussions

Parse white space

Zero to Advanced in 20 days

No data in configuration pane of tool immediately ...

Re: How to separate IDs from the text

Re: Zero to Advanced in 20 days

Re: Zero to Advanced in 20 days