Trim strings from certain words
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hello,
I'd like to trim string records beginning from certain words. For instance I have companies' names such as XYZ Limited, XYZ LTD, XYZ AG, XYZ GmbH, XYZ
Gesellschaft mit beschränkter Haftung.
Is there a way to trim these records beginning from predefined words, for instance "LTD", "Gesellschaft". I was thinking of RegEx toll but I can't figure out a way how to apply it to this issue.
Does anyone have an ideas how to overcome this problem?
Solved! Go to Solution.
- Labels:
- Preparation
- Transformation
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
You can use the Regex Tool with the Replace functionality.
For example to Parse starting with XYZ
(XYZ)(.*)
and replace it with the first marked group $1
that would pull out just the XYZ
If you change the order you could get the text before or after your keyword.
I think if you play around with that option you could get want you want. If you could supply a couple of sample records with the before and after that would be great.
Cheers,
Bob
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I'd suggest that you put the strings you want to remove (e.g. LTD, gmbh) in a Text Input tool in a "Find" field and leave blank/empty values in a "Replace" field and then use the Find Replace tool.
You could also nest Replace() functions in the Formula tool:
REPLACE(REPLACE([CompanyName],'LTD',''),'gmbh','')
...but remember that the Formula approach is case-sensitive.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
You can use the Regex:
\sLimited$|\sLTD$|\sAG$|\sGmbH$|\sGesellschaft\smit\sbeschränkter\sHaftung$|\sCo$|\sCo\sLTD$
set to case insensitive, and a blank replace.
the "\s" is a space character, "$" means end of string, and "|" means OR. All together this means if it find any of these phrases at the end of a sting, it will remove them.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi! If you'd like to combine solutions to make it more dynamic, you could do something like the following:
1. Create a text input list of keywords as @tom_montpool suggested
2. Use a Formula to add a RegEx space character \s at the beginning and .* at the end of each keyword to capture the leading space and any characters that might be after the keywords
3. Summarize Tool to concatenate the keywords together, separated by a | to indicate "or"
4. Append concatenated string to the list of company names
5. Use RegExReplace formula in the Formula tool to replace anything that matches the keywords in your concatenated list with "" (the formula version of @Joe_Mako RegEx Tool solution):
REGEX_Replace([CompanyName],[Concat_RegExString], "")
This way, if you found more company suffixes you wanted to add to the list (Inc., Co., etc.), you'd only need to add them to the bottom of the text input list and run again, rather than having to go in and insert them into the RegEx Tool formula. (Note that it still needs to be case sensitive the way I've shown it above.)
Cheers! NJ
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thanks a lot, it works perfectly for me. I actually I used directly RegEx tool.
Nevertheless I have another analogical question. How to do the same thing (RegEx formula) but to trim certain words from the beginning of a string?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
You could do a similar thing with concatenating words that you want to remove from the beginning of the string, with a slight modification to step #2: You'd add ^ symbol to the front of the word, indicating beginning of the string, and then include \s at the end to mark the space that follows the word you're trying to remove.
So your string might look something like ^The\s, if the word you were trying to remove was "The".
Then follow the rest of the steps (if you're concatenating multiple words that you want removed) & use the RegEx replace formula in the example above to strip those word(s) out from the beginning of the string. Is that what you were looking for? Thanks!!
NJ
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
This looks like exactly what I am after.... Although I am trying to remove use a lookup
KG
KILOS
LITRES
from strings such as
100x500LITRES
I have tried using your solution but Im guessing that there is an issue with there being no spaces in my Strings..
Any thoughts?
Thanks for your time,
Fiorano
