Ok I'll admit I'm not well versed in Regex so any help would be great. I am doing Text analytics and when I use the data cleans tool I am ending up with some long strings that are junk mostly due to people copying and pasting emails into our CRM tool and they are usually URLs or emails like this:
BOBBYJOEWESTERNFRONTDIVISIONOFINTERNALCOM
Any of these large strings of data are not useful so I'm trying to figure out a way where I can find any word that has more then like 15 characters or more would be replaced by a blank (or " "). I do not want an exact match on the word or the number of characters only >= the number of characters.
I feel like the Regex_replace should be able to do this but I am not sure how to code the pattern to make it work
Dan
Solved! Go to Solution.
Are you looking for something similar to the below?
if Length([Field1])> 15 then "" else [Field1] endif
Hi Christine,
That's on the right track but I don't want to replace the whole filed with a blank just the long word within the field and leave all the other normally sized words alone. The field is a comment field from our CRM system so they can have hundreds or thousands of legitimist words. The long ones like I posted in the example I know are not legit so I just want to remove those words.
Sorry I would post an example of the data set if I could mock up something that wasn't sensitive information.
Let me know if that clarifies and if not I'll try to mock something up.
D
Hi @danielreedsmith — You can do it by several ways... take a look about three examples in the attached workflow:
If this solves your issue please mark the answer "Solved", if not let me know!
(or provide us 2 or 3 dummy records along with your expected output layout)
For some reason this doesn't work with cell has a lot of words in it. I still found the larger words.