Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

find text in html code and put pieces into a table

jimmys
6 - Meteoroid

I have lots of records with an ID and html code. On that html page there are funky chunks of code that I highlighted in red below:

 

more html...

<p style="padding-left: 30.0px;">Bacon ipsum dolor amet salami t-bone pancetta, chuck leberkas tenderloin pork loin. Filet mignon strip steak pig venison meatball chuck spare ribs <span style="font-family: arial, helvetica, sans-serif; font-size: 10pt;">[[--ContentED.9uFvErVUB342rwukUx3H9||Shank sausage pancetta chicken||1758699LJ||Article--]] and [[--ContentED.V1cnQOilHA5234rs4z27ZuA7||Short ribs andouille short loin pork||1753243LJ||Article--]] for more information.</span></p>

<p style="padding-left: 30.0px;">[[[[AssetED.w8dho5pdsfserwerJ0Q7]]]]</p>¶
<p style="padding-left: 30.0px;">[[[[AssetED.icqFyEsdf354wtgs0HwYqwM9·height="305"·width="574"]]]]</p>¶

...more html

 

I'd like to copy the ID (let's say 2345 for the example above) and the four (or whatever number it is) chuncks into a table that looks like this:

html_code_table.PNG

 

What methods can you suggest? Thanks! - Jimmy

 

3 REPLIES 3
MarqueeCrew
20 - Arcturus
20 - Arcturus
I've got an idea brewing. I might replace everything between ] and [ (that isn't another ']') with a delimiter.

That gives you JUNK followed by [. You can eliminate the junk. Then you have ] followed by JUNK. You can eliminate the junk.

You'll then have to format the data as required. I'll check in the morning to see if you've solved this.

Mark
Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
michael_treadwell
ACE Emeritus
ACE Emeritus

The RegEx pattern (\[{1,4}.*?\]{1,4}) should match strings found between 1 and 4 brackets including the brackets:

  • ( begins a marked group
  • \[{1,4} matches anywhere between 1 and 4 left bracket characters
  • .*? matches all characters in a lazy (non-greedy) way
  • \] matches anywhere between 1 and 4 right bracket characters
  • ) end a marked group

 

Use the RegEx Tool and check tokenize with the pattern above. Select 'Split to Rows'. I've also attached the module so that you can see for yourself.

 

Capture.PNG

jimmys
6 - Meteoroid

Michael,

Beautiful. That worked as I had hoped. Loved learning that too. I have been needing to learn RegEx more and this helps motivates me to do it. Thank you! -Jimmy

Labels