Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Help with regular expression to split a column into multiple rows with a primary key

craigja
8 - Asteroid

Hi folks,

 

I have a challenge!  I've sat and looked at this all morning but cant get it to work correctly but I'm sure at least one of you will be up for it 🙂

 

The data below needs reformatted, it is in one column and in another column I have a primary key (Publication ID).  The numbers seem to identify the publisher but I'm not interested in the number, only in using it to split out the data, into separate rows with the primary key.  So the pattern I'm looking for is:  ignore the number, but take what comes after the number, and before the next number, the first section between the commas is the acronym, I want that, then the next section (before the start of the next number) should be in brackets, which is the full name of the publisher.

 

A simple example is

Primary Key                       Data

1234                                    41,EU ESMA,European Securities Market Authority, 42, FCA, Financial Conduct Authority

Should look like: 

Primary Key                       Data

1234                                    EU ESMA (European Securities Market Authority)

1234                                    FCA (Financial Conduct Authority)

 

Sounds simple!  But it soon gets complicated, there are often multiple publishers and sometimes the name of the publisher has commas in it as well, so I really need to use the numbers to signify where the publisher changes.

 

Here are some more complicated ones!!

Primary Key                       Data

5678                                    49,FinCEN,Financial Crimes Enforcement Network,106,US Treasury,US Department of the Treasury

Should look like: 

Primary Key                      Data

5678                                   FinCEN (Financial Crimes Enforcement Network)

5678                                   US Treasury (US Treasury,US Department of the Treasury)

 

 

Primary Key                       Data

9876                                    12,Basel,Bank for International Settlements (BIS), and Basel Committee,19,CFTC,Commodities Futures Trading                                                                      Commission,46,FDIC,Federal Deposit Insurance Corporation,47,Fed,Federal Reserve Board, (US Central Bank),87,OCC,Office                                              of the Comptroller of the Currency,313,OpCC,Options Clearing

Should look like:

Primary Key                       Data

9876                                    Basel (Bank for International Settlements (BIS), and Basel Committee)

9876                                    CFTC (Commodities Futures Trading Commission)

9876                                    FDIC (Federal Deposit Insurance Corporation)

9876                                    Fed (Federal Reserve Board, (US Central Bank))

9876                                    OCC (Office of the Comptroller of the Currency)

9876                                    OpCC (Options Clearing)

 

 

 

5 REPLIES 5
fmvizcaino
17 - Castor
17 - Castor

Hi @craigja ,

 

I'm using tokenize option from regex and text to columns to achieve the main part of your problem.

Attached is an example showing how to get that done.

 

Let me know if this works for you.

Best,

Fernando Vizcaino

afv2688
16 - Nebula
16 - Nebula

Hello @craigja ,

 

Hope this helps 🙂

 

Untitled.png

craigja
8 - Asteroid

Yep, that looks good!  Is there any way to remove . (full stops) and also sometimes there are 2 commas, where there are 2 next to each other, can I remove 1 of them?

afv2688
16 - Nebula
16 - Nebula

You can remove commas and fulls tops from the beggining and end using the Trim function:

 

Trim([Field],  ",")

or

Trim([Field], ".")

 

You can also eliminate easily any character within a string using regex function:

 

Regex_Replace([Field], <Character to look for>, <Character to delete>)

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

 

Regards

fmvizcaino
17 - Castor
17 - Castor

Hi @craigja ,

 

Take a look at my example as I'm doing exactly what you requested.

Let me know if that works for you.

 

Best,

Fernando Vizcaino

Labels