Hi Everyone,
How to get rid if whitespace in between the data.
Input:
C lass C Range
Cla ss C Range
Cl ass C Range
De ce mber 31, 20XX
De ce mbe r 31 20XX Valuation Me thodologie s Unobservable Inputs
Out put:
Class C Range
Class C Range
Class C Range
December 31, 20XX
December 31 20XX Valuation Methodologies Unobservable Inputs
Thank you in advance.
Hi @sachinsg
One thing you can do is remove all spaces using replace function.
Getting your desired output its going to be bit complicated.
Hi @sachinsg
Here's a possible solution.
NOTE: This only works properly if all your words start with a capital letter or a digit as in your example data. If some of the words start with a lower case letter, then the words will string together. See the Non-dictionary words below.
Start by making the field longer, adding a record Id and adding a trailing space to make the regex simpler. Split the groups of letters to rows using space as the delimiter. The Multirow Formula tool marks the word segments as the start of a new word or the continuation of the previous one, depending on whether the first character is a capital letter of a digit. The next summarize groups the word segments together and the last one concatenates the words into sentences using a space as a delimiter. The output here looks like this.
I added the last line to the input to show what happens if there are lower case words in the input. To try and find these non-words, the bottom part of the workflow compares all the words that start with a letter with a dictionary using a join and the lower browse shows all words that don't match
The dictionary I used is just made from the list of words in your input. A complete dictionary would be too large to attach to this post, but you can download one here
Dan