Hi!
I have a dataset concerning a vast amount of insurance-numbers, all of which I need to parse out and place in separate columns. These numbers are found within a text-field that has very little consistency. However, an insurance-number always has a length of 5-7 digits, thus that's what we are looking for.
Therefore, I applied, Tostring(REGEX_Replace([message],"^.*?(\d{5,7}).*?$","$1")) , in a formula tool. This only did half the job, since one text-field may hold 2 or 3 (max 5) insurance-number, all of which must be parsed out and placed in separate columns. So what I'm looking for is something, be it text-to-colums or parsing, that replicates the statement until there is no more text and NOT until the first match is found. I really hope you are able to help (and probably amaze me)!
Thank you in advance.
FYI:
I have added sample data below that also shows the outcome my formula above resulted in compared to what it should have resulted in.
Text-field | Outcome from formula | Intended outcome1 | Intended outcome2 | Intended outcome3 |
Insurance-number: 123456 and this customer needs special attention attention due to second insurance-number: 654321 | 123456 | 123456 | 654321 | |
Insurance-number was granted 15/7: 999333 | 999333 | 999333 | ||
In. no. 1 (left back door): 224466 and in. no. 2 (right front door): 124578 and in no. 3 (right back door): 334466 | 224466 | 224466 | 124578 | 334466 |
This customer didn't provide in no |
Solved! Go to Solution.
This sounded similar to an issue I've run into before.
Instead of using a regex expression in the Formula tool, we used the RegEx tool to Tokenize what we are looking for in the string. This action will separate out what you Tokenize into a specified number of columns.
Thanks a lot @JosephSerpis!! I have implemented it into my workflow, it works like a charm!
Out of curiosity, compared to the solution below - any particular reason to why you went for 'split to rows' in regex and working your way back to columns, rather than splitting to columns initially; am I missing something crucial here? 🙂
Hi @Alteregoryx splitting to row is more dynamic as you don't need to specify the amount of rows to split by rather than splitting to columns as you have to set a arbitrary number for the amount of columns you want. However if you data has more numbers in the string than the amount of columns you are splitting you will lose data and have to adjust manually where you won't have that issue with splitting into rows then crosstabing the data as it will adjust to data accordingly.