Why is my Tokenize parse not working correctly? I am expecting each "<juice>.*</juice>" combo to be in its own column.
Text Data-
Field 1 |
the beginning of my questions<juice>apple</juice>alphabet soup is good<juice>orange</juice>the earth is round<juice>cranberry</juice>go lakers |
Output-
Field 1 | Field 11 |
the beginning of my questions<juice>apple</juice>alphabet soup is good<juice>orange</juice>the earch is round<juice>cranberry</juice>go lakers | <juice>apple</juice>alphabet soup is good<juice>orange</juice>the earch is round<juice>cranberry</juice> |
It's really bad when I just watch this board for REGEXs to solve 😂.
I think yours is fairly easy--you are just missing your parentheses to capture the juice name. I also changed it to word characters as opposed to the period which will take white spaces and gum up your results.
\<juice\>(\w*)\<\/juice\>
I like to escape (the backslash) all the symbol characters just in case. Let me know if it works!
Here is my solution.
Thank you for the response. This did work, however I would prefer to use (.*) instead of (<\w+\>) to make it more dynamic in case a number or special character were to come through as an input. Any ideas?
No luck, now no data is parsing. Also would prefer to stray away from the (\w*) in order to make it more dynamic, in case a number or special character were to come through in input.
I think in order to answer your question better, we need to know more about the data surrounding the information. Will there always be three <juice> tags? If so, that makes all the difference:
\<juice\>(.*)\<\/juice\>[\s\S]*\<juice\>(.*)\<\/juice\>[\s\S]*\<juice\>(.*)\<\/juice\>
That will work every time no matter what as long as there are always three juice tags.
Cheers!