Missed the Q4 Fall Release Product Update? Watch the on-demand webinar for more info on the latest in Designer 24.2, Auto Insights Magic Reports, and more!
Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Parse white space

FECL3
8 - Asteroid

Hello

 

I have a set of data (see sample below). I want to parse the data so that there are 6 columns. the 5th and 6th column are either a number or whitespace depending on the position of digit in the string.

 

1000 All Goodwill ASSETS 20500000 
1050 All Amortisation Goodwill ASSETS                          20500000 

 

I have the following expression, however I can't seem to add a capturing group in the 5th group to capture either the digit or additional white space, and then the 6th group capturing the digit or nothing.

 

(\d{4,6})\s*(All)\s(.*)(ASSETS|LIABILITY|OWNERS EQ|INCOME|EXPENSE|BROKER)\s*(\d*)

 

Any help would be much appreciated 

 

 

Thanks

14 REPLIES 14
ShankerV
17 - Castor

Hi @FECL3 

 

Please check are you expecting the result like this.

 

ShankerV_0-1675163972294.png

 

Many thanks

Shanker V

binuacs
21 - Polaris

@FECL3 One way of doing this

 

binuacs_0-1675164087100.png

 

Raj
16 - Nebula

i think this is not what he is looking for

as much i understood he is looking for this

Raj_0-1675166909432.png

 

FECL3
8 - Asteroid

Hi - thanks for your response

 

As mentioned I need to parse the data and not use text to columns. The data above is only example and I can't use a space as a delimiter.

 

The parsing works because everything in between "All" and "(ASSETS|LIABILITY|OWNERS EQ|INCOME|EXPENSE|BROKER)" is the column description and must be all in one column.

 

The result should be as follows (either Field 5 or Field 6 being a digit or Null (or empty / or otherwise))

FECL3_0-1675168521200.png

 

 

DavidSkaife
13 - Pulsar

Hi @FECL3 

 

Here you go, try this Regex:

(\d{4,6})\s*(All)\s(.*)(ASSETS|LIABILITY|OWNERS EQ|INCOME|EXPENSE|BROKER)\s(\d*|\s*)(\s+\d*|\s*)

 

I've stuck a Data Cleansing tool on to remove the duplicate whitespace and you get this:

 

DavidSkaife_0-1675168684200.png

 

FECL3
8 - Asteroid

Thanks @DavidSkaife - very useful however I fear I have oversimplified the data. When I add this to my workflow it doesn't work because there is more than one space character (\s) between the 5th and 6th capturing group. 

 

The data actually looks like this and as you can see there is multiple white space characters between "Assets" and the value. However the number of whitespace characters differ (this number also differs across the rest of my data set so can't define a value) depending on whether I want it in the 5th or 6th capturing group. 

 

 

FECL3_0-1675169170395.png

 

To hopefully add some more context I have added a snip from Regex101. The yellow highlighted section should come through as the 5th capturing group, whilst the proceeding number should be in the 6th capturing group. 

FECL3_1-1675169357529.png

 

 

 

Raj
16 - Nebula

can you share 5 line of un simplified data , so we can try new logic for same

DavidSkaife
13 - Pulsar

Hi @FECL3 

 

Ok try this:

(\d{4,6})\s*(All)\s(.*)(ASSETS|LIABILITY|OWNERS EQ|INCOME|EXPENSE|BROKER)\s(\d*|\s+)\s(\s+\d*|\s*)

 

However i suspect that if there is no structure as to the number of whitespace after ASSETS it's going to be tricky to find a Regex that works in one go.

 

For the 5th group, will it always be 8 characters/8 whitespace?

How do you know if the final value is in the 5th or 6th based on the structure?

 

As @Raj said sharing a subset of your data may be helpful for us to troubleshoot for you

FECL3
8 - Asteroid

Thank you for the suggestions. Please see example data below in the attached workflow.

 

Please note I have changed the regex to as the numbers have commas and full stops in. 

 

(\d{4,6})\s*(All)\s(.*)(ASSETS|LIABILITY|OWNERSEQ|INCOME|EXPENSE|BROKER)\s*(\d{1,3},\d{1,3}.\d{2}|\d{1,3}.\d{2}|\d{1,3},\d{1,3},\d{1,3}.\d{2})

 

The 5th group will have a range of 3 to 11 characters in.

 

I know the final value is in the 6th structure because it sits within a different column in the source data.

Labels
Top Solution Authors