In case you missed the announcement: Alteryx One is here, and so is the Spring Release! Learn more about these new and exciting releases here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Regex- Tokenize

garretwalters12
8 - Asteroid

Why is my Tokenize parse not working correctly? I am expecting each "<juice>.*</juice>" combo to be in its own column.

 

Text Data- 

Field 1
the beginning of my questions<juice>apple</juice>alphabet soup is good<juice>orange</juice>the earth is round<juice>cranberry</juice>go lakers

 

garretwalters12_0-1611026556071.png

 

Output- 

Field 1Field 11
the beginning of my questions<juice>apple</juice>alphabet soup is good<juice>orange</juice>the earch is round<juice>cranberry</juice>go lakers<juice>apple</juice>alphabet soup is good<juice>orange</juice>the earch is round<juice>cranberry</juice>
5 REPLIES 5
BretCarr
10 - Fireball

It's really bad when I just watch this board for REGEXs to solve ðŸ˜‚.

 

I think yours is fairly easy--you are just missing your parentheses to capture the juice name. I also changed it to word characters as opposed to the period which will take white spaces and gum up your results.

 

\<juice\>(\w*)\<\/juice\>

 

I like to escape (the backslash) all the symbol characters just in case. Let me know if it works!

sparksun
11 - Bolide

Here is my solution.

 

sparksun_0-1611049645985.png

 

garretwalters12
8 - Asteroid

Thank you for the response. This did work, however I would prefer to use (.*) instead of (<\w+\>) to make it more dynamic in case a number or special character were to come through as an input. Any ideas?

garretwalters12
8 - Asteroid

No luck, now no data is parsing. Also would prefer to stray away from the (\w*) in order to make it more dynamic, in case a number or special character were to come through in input.

BretCarr
10 - Fireball

I think in order to answer your question better, we need to know more about the data surrounding the information. Will there always be three <juice> tags? If so, that makes all the difference:

 

\<juice\>(.*)\<\/juice\>[\s\S]*\<juice\>(.*)\<\/juice\>[\s\S]*\<juice\>(.*)\<\/juice\>

 

That will work every time no matter what as long as there are always three juice tags.

 

Cheers!

Labels
Top Solution Authors