Hi folks,
I have a challenge! I've sat and looked at this all morning but cant get it to work correctly but I'm sure at least one of you will be up for it 🙂
The data below needs reformatted, it is in one column and in another column I have a primary key (Publication ID). The numbers seem to identify the publisher but I'm not interested in the number, only in using it to split out the data, into separate rows with the primary key. So the pattern I'm looking for is: ignore the number, but take what comes after the number, and before the next number, the first section between the commas is the acronym, I want that, then the next section (before the start of the next number) should be in brackets, which is the full name of the publisher.
A simple example is
Primary Key Data
1234 41,EU ESMA,European Securities Market Authority, 42, FCA, Financial Conduct Authority
Should look like:
Primary Key Data
1234 EU ESMA (European Securities Market Authority)
1234 FCA (Financial Conduct Authority)
Sounds simple! But it soon gets complicated, there are often multiple publishers and sometimes the name of the publisher has commas in it as well, so I really need to use the numbers to signify where the publisher changes.
Here are some more complicated ones!!
Primary Key Data
5678 49,FinCEN,Financial Crimes Enforcement Network,106,US Treasury,US Department of the Treasury
Should look like:
Primary Key Data
5678 FinCEN (Financial Crimes Enforcement Network)
5678 US Treasury (US Treasury,US Department of the Treasury)
Primary Key Data
9876 12,Basel,Bank for International Settlements (BIS), and Basel Committee,19,CFTC,Commodities Futures Trading Commission,46,FDIC,Federal Deposit Insurance Corporation,47,Fed,Federal Reserve Board, (US Central Bank),87,OCC,Office of the Comptroller of the Currency,313,OpCC,Options Clearing
Should look like:
Primary Key Data
9876 Basel (Bank for International Settlements (BIS), and Basel Committee)
9876 CFTC (Commodities Futures Trading Commission)
9876 FDIC (Federal Deposit Insurance Corporation)
9876 Fed (Federal Reserve Board, (US Central Bank))
9876 OCC (Office of the Comptroller of the Currency)
9876 OpCC (Options Clearing)
Solved! Go to Solution.
Hi @craigja ,
I'm using tokenize option from regex and text to columns to achieve the main part of your problem.
Attached is an example showing how to get that done.
Let me know if this works for you.
Best,
Fernando Vizcaino
Yep, that looks good! Is there any way to remove . (full stops) and also sometimes there are 2 commas, where there are 2 next to each other, can I remove 1 of them?
You can remove commas and fulls tops from the beggining and end using the Trim function:
Trim([Field], ",")
or
Trim([Field], ".")
You can also eliminate easily any character within a string using regex function:
Regex_Replace([Field], <Character to look for>, <Character to delete>)
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Regards
Hi @craigja ,
Take a look at my example as I'm doing exactly what you requested.
Let me know if that works for you.
Best,
Fernando Vizcaino
User | Count |
---|---|
17 | |
16 | |
14 | |
6 | |
5 |