I need to extract each University from the String field and place it in new column Institution.
The University could be named in many different ways throughout all the records.
Is this a job for fuzzy matching or something else?
String | Institution |
University of Limerick - Centre for Robotics & Intelligent Systems< | University of Limerick |
University of Limerick - Synthesis & Solid State Pharmaceutical Centre (SSPC)< | University of Limerick |
University College Dublin - UCD School of Languages, Cultures & Linguistics<br | University College Dublin |
University of Leiden - Institute of Political Science of the Faculty of Social and Behavioural Sciences < | University of Leiden |
Dublin City University - School of Applied Language and Intercultural Studies<br | Dublin City University |
University of British Columbia<b | University of British Columbia |
NHTV Breda University of Applied Sciences - The Academy for Digital Entertainment<b | NHTV Breda University of Applied Sciences |
Warwick Business School, The University of Warwick | University of Warwick |
University College Dublin - UCD School of Mathematics and Statistics<br | University College Dublin |
Solved! Go to Solution.
Hello,
I have used a parsing tool to pull the relevant string from the text. The challenge is identifying the logic to break it out.
I used the RegEx tool with the Output Method set to Parse. Using the following Regular Expression (.*-|.*&|,.*), I'm sure somebody on the community can put together a shorter expression but this got me to the goal.
Please see completed workflow attached.
Thanks,
Nick
Hey @NickC,
I do have an alternative to your wonderful formula:
(.*?)\s*[^a-z\s].*
It is a bit wordy, but here it goes:
(.*?) = Create a group of any characters up until the first time that you encounter whatever comes next.
\s* = 0 or more spaces followed by
[^a-z\s] = a non letter character or a non space
.* = followed by anything
We're default to case insensitive. Based upon the examples provided, this will work for you @jackdaniels. The extra benefit is that there will be no trailing spaces in the output.
Cheers,
Mark
That is amazing. I really wish I properly understood regex.
Thanks @NickC and @MarqueeCrew for the explanation.
Here's a video to start you off...
Cheers,
Mark
hi @jackdaniels,
Meanwhile learning Regex, you can also attain the results using text to column tools. I am attaching the solved workflow.