Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Parse all table row tags <TR>.*</TR> as rows

hellyars
13 - Pulsar

 

I have a minified HTML source.  85,000 table records are trapped in one giant paragraph.  The file is 112MB.  

 

GOAL = to extract each table row as a single row.  In crude terms, I want (<tr.*?</tr>).  That is, I want a row for each opening <tr> and closing </tr> tag with everything in between.

 

I tried (\<tr.*?\<\/tr\>), but this did not work.  

 

 

<tr><td><table><tr><td align="right" valign="top"><b>Number:</b></td><td><span>TACO ITEMS</span></td></tr><tr><td align="right" valign="top"><b>Organization ID:</b></td><td><span>abcd</span></td></tr><tr><td align="right" valign="top"><b>Name:</b></td><td><span>MORE TACOS PLS</span></td></tr><tr><td align="right" valign="top"><b>TPP Create Date:</b></td><td><span>2020-03-16 08:53:55 EDT</span></td></tr><tr><td align="right" valign="top"><b>Last MENU ITEMS:</b></td><td></td></tr></table></td></tr><tr><td><div class="jstablecontainer"><table class="layoutTable"><tbody><tr><td><div class="frame_outer"><div class="frame"><span><div class="frameTitle" summary="null"><table class="layoutTable100"><tbody><tr><td class="title" nowrap="">MENU Structure</td><td class="objCount" nowrap="">(82,104 objects)</td></tr></tbody></table></div><div class="frameContent" style="width: 100%;"><table border="1" cellpadding="1" cellspacing="1" class="tablecellsepbg frameTable"><thead><tr><th class="tablecolumnheaderbg" nowrap="" scope="col"><span class="tablecolumnheaderfont">Number</span></th><th class="tablecolumnheaderbg" nowrap="" scope="col"><span class="tablecolumnheaderfont">Name</span></th><th class="tablecolumnheaderbg" nowrap="" scope="col"><span class="tablecolumnheaderfont">UPC Code</span></th><th class="tablecolumnheaderbg" nowrap="" scope="col"><span class="tablecolumnheaderfont">Version</span></th><th class="tablecolumnheaderbg" nowrap="" scope="col"><span class="tablecolumnheaderfont">Context</span></th><th class="tablecolumnheaderbg" nowrap="" scope="col"><span class="tablecolumnheaderfont">QUANTITY</span></th><th class="tablecolumnheaderbg" nowrap="" scope="col"><span class="tablecolumnheaderfont">UOM</span></th></tr></thead><tbody class="tablebody" id="tb__netmarkets.wp.wpcontent">

<tr class="o"><td class="c tabledatacell" nowrap=""><span>TACO ITEMS</span></td><td class="c tabledatacell" nowrap=""><span>TACO FAMILY</span></td><td class="c tabledatacell" nowrap=""><span>abcd</span></td><td class="c tabledatacell" nowrap=""><span>X0.3</span></td><td class="c tabledatacell" nowrap=""><span>BURRIOT</span></td><td class="c tabledatacell" nowrap=""><span></span></td><td class="c tabledatacell" nowrap=""><span></span></td></tr>
<tr class="e"><td class="c tabledatacell" nowrap=""><img border="0" hspace="0" src="data&colon;image/gif;base64,R0lGODlhEAAQAIABALCypf///yH5BAEAAAEALAAAAAAQABAAQAIdjB+Ay+D/WJJU0XZxRnab7oGbmJGXWXkHKrEaUwAAOw==" vspace="0"><span>abcd2472</span></td><td class="c tabledatacell" nowrap=""><span>KIT LIST</span></td><td class="c tabledatacell" nowrap=""><span>abcd</span></td><td class="c tabledatacell" nowrap=""><span>C.5</span></td><td class="c tabledatacell" nowrap=""><span>TACO</span></td><td class="c tabledatacell" nowrap=""><span>1.0</span></td><td class="c tabledatacell" nowrap=""><span>each</span></td></tr><tr class="o"><td class="c tabledatacell" nowrap=""><img border="0" hspace="0" src="data&colon;image/gif;base64,R0lGODlhEAAQAIAAALCypS5KbCH5BAEAAAEALAAAAAAQABAAAAIbjB+Ay8qf4HMS0Wou1pVLD4ETZpGH2JiZGj0FADs=" vspace="0"><img border="0" hspace="0" src="data&colon;image/gif;base64,R0lGODlhEAAQAIABALCypf///yH5BAEAAAEALAAAAAAQABAAQAIdjB+Ay+D/WJJU0XZxRnab7oGbmJGXWXkHKrEaUwAAOw==" vspace="0"><span>57K8780-001</span></td><td class="c tabledatacell" nowrap=""><span>BK</span></td><td class="c tabledatacell" nowrap=""><span>abcd</span></td><td class="c tabledatacell" nowrap=""><span>-.5</span></td><td class="c tabledatacell" nowrap=""><span>FISH</span></td><td class="c tabledatacell" nowrap=""><span>1.0</span></td><td class="c tabledatacell" nowrap=""><span>each</span></td></tr><tr class="e"><td class="c tabledatacell" nowrap=""><img border="0" hspace="0" src="data&colon;image/gif;base64,R0lGODlhEAAQAIAAALCypS5KbCH5BAEAAAEALAAAAAAQABAAAAIbjB+Ay8qf4HMS0Wou1pVLD4ETZpGH2JiZGj0FADs=" vspace="0"><img border="0" hspace="0" src="data&colon;image/gif;base64,R0lGODlhEAAQAIAAALCypS5KbCH5BAEAAAEALAAAAAAQABAAAAIbjB+Ay8qf4HMS0Wou1pVLD4ETZpGH2JiZGj0FADs=" vspace="0"><img border="0" hspace="0" src="data&colon;image/gif;base64,R0lGODlhEAAQAIABALCypf///yH5BAEAAAEALAAAAAAQABAAQAIdjB+Ay+D/WJJU0XZxRnab7oGbmJGXWXkHKrEaUwAAOw==" vspace="0"><span>12414308-018</span></td><td class="c tabledatacell" nowrap=""><span>ONIONS</span></td><td class="c tabledatacell" nowrap=""><span>abcd</span></td><td class="c tabledatacell" nowrap=""><span>D.4</span></td><td class="c tabledatacell" nowrap=""><span>NERD</span></td><td class="c tabledatacell" nowrap=""><span>7.0</span></td><td class="c tabledatacell" nowrap=""><span>each</span></td></tr><tr class="o"><td class="c tabledatacell" nowrap=""><img border="0" hspace="0" src="data&colon;image/gif;base64,R0lGODlhEAAQAIAAALCypS5KbCH5BAEAAAEALAAAAAAQABAAAAIbjB+Ay8qf4HMS0Wou1pVLD4ETZpGH2JiZGj0FADs=" vspace="0"><img border="0" hspace="0" src="data&colon;image/gif;base64,R0lGODlhEAAQAIAAALCypS5KbCH5BAEAAAEALAAAAAAQABAAAAIbjB+Ay8qf4HMS0Wou1pVLD4ETZpGH2JiZGj0FADs=" vspace="0"><img border="0" hspace="0" src="data&colon;image/gif;base64,R0lGODlhEAAQAIAAALCypS5KbCH5BAEAAAEALAAAAAAQABAAAAIbjB+Ay8qf4HMS0Wou1pVLD4ETZpGH2JiZGj0FADs=" vspace="0"><img border="0" hspace="0" src="data&colon;image/gif;base64,R0lGODlhEAAQAIAAALCypS5KbCH5BAEAAAEALAAAAAAQABAAAAIZjB+Ay8qf4HMS0Wou1pVLAIYhRpbmiaZmAQA7" vspace="0"><span>124247324</span></td><td class="c tabledatacell" nowrap=""><span>CHEESES</span></td><td class="c tabledatacell" nowrap=""><span>abcd</span></td><td class="c tabledatacell" nowrap=""><span>A.3</span></td><td class="c tabledatacell" nowrap=""><span>CERVELO</span></td><td class="c tabledatacell" nowrap=""><span>1.0</span></td><td class="c tabledatacell" nowrap=""><span>each</span></td></tr><tr class="e"><td class="c tabledatacell" nowrap=""><img border="0" hspace="0" src="data&colon;image/gif;base64,R0lGODlhEAAQAIAAALCypS5KbCH5BAEAAAEALAAAAAAQABAAAAIbjB+Ay8qf4HMS0Wou1pVLD4ETZpGH2JiZGj0FADs=" vspace="0"><img border="0" hspace="0" src="data&colon;image/gif;base64,R0lGODlhEAAQAIAAALCypS5KbCH5BAEAAAEALAAAAAAQABAAAAIbjB+Ay8qf4HMS0Wou1pVLD4ETZpGH2JiZGj0FADs=" vspace="0"><img border="0" hspace="0" src="data&colon;image/gif;base64,R0lGODlhEAAQAIABALCypf///yH5BAEAAAEALAAAAAAQABAAQAIdjB+Ay+D/WJJU0XZxRnab7oGbmJGXWXkHKrEaUwAAOw==" vspace="0"><span>abcd2317</span></td><td class="c tabledatacell" nowrap=“"><span>SRAME ETAP,</span></td><td class="c tabledatacell" nowrap=""><span>abcd</span></td><td class="c tabledatacell" nowrap=""><span>B.4</span></td><td class="c tabledatacell" nowrap=""><span>TACO</span></td><td class="c tabledatacell" nowrap=""><span>6.0</span></td><td class="c tabledatacell" nowrap=""><span>each</span></td></tr><tr class="o"><td class="c tabledatacell" nowrap=""><img border="0" hspace="0" src="data&colon;image/gif;base64,R0lGODlhEAAQAIAAALCypS5KbCH5BAEAAAEALAAAAAAQABAAAAIbjB+Ay8qf4HMS0Wou1pVLD4ETZpGH2JiZGj0FADs=" vspace="0"><img border="0" hspace="0" src="data&colon;image/gif;base64,R0lGODlhEAAQAIAAALCypS5KbCH5BAEAAAEALAAAAAAQABAAAAIbjB+Ay8qf4HMS0Wou1pVLD4ETZpGH2JiZGj0FADs=" vspace="0"><img border="0" hspace="0" src="data&colon;image/gif;base64,R0lGODlhEAAQAIABALCypf///yH5BAEAAAEALAAAAAAQABAAQAIdjB+Ay+D/WJJU0XZxRnab7oGbmJGXWXkHKrEaUwAAOw==" vspace="0"><span>abcd7223-002</span></td><td class="c tabledatacell" nowrap=""><span>LETTUCE</span></td><td class="c tabledatacell" nowrap=""><span>abcd</span></td><td class="c tabledatacell" nowrap=""><span>B.5</span></td><td class="c tabledatacell" nowrap=""><span>SAUCE</span></td><td class="c tabledatacell" nowrap=""><span>1.0</span></td><td class="c tabledatacell" nowrap=""><span>each</span></td></tr><tr class="e"><td class="c tabledatacell" nowrap=""><img border="0" hspace="0" src="data&colon;image/gif;base64,R0lGODlhEAAQAIAAALCypS5KbCH5BAEAAAEALAAAAAAQABAAAAIbjB+Ay8qf4HMS0Wou1pVLD4ETZpGH2JiZGj0FADs=" vspace="0"><img border="0" hspace="0" src="data&colon;image/gif;base64,R0lGODlhEAAQAIAAALCypS5KbCH5BAEAAAEALAAAAAAQABAAAAIbjB+Ay8qf4HMS0Wou1pVLD4ETZpGH2JiZGj0FADs=" vspace="0"><img border="0" hspace="0" src="data&colon;image/gif;base64,R0lGODlhEAAQAIAAALCypS5KbCH5BAEAAAEALAAAAAAQABAAAAIbjB+Ay8qf4HMS0Wou1pVLD4ETZpGH2JiZGj0FADs=" vspace="0"><img border="0" hspace="0" src="data&colon;image/gif;base64,R0lGODlhEAAQAIAAALCypS5KbCH5BAEAAAEALAAAAAAQABAAAAIZjB+Ay8qf4HMS0Wou1pVLAIYhRpbmiaZmAQA7" vspace="0"><span>5g354</span></td><td class="c tabledatacell" nowrap=""><span>SPICES</span></td><td class="c tabledatacell" nowrap=""><span>abcd</span></td><td class="c tabledatacell" nowrap=""><span>C.2</span></td><td class="c tabledatacell" nowrap=""><span>TOSTADA</span></td><td class="c tabledatacell" nowrap=""><span>0.0</span></td><td class="c tabledatacell" nowrap=""><span>as needed</span></td></tr>

 

2 REPLIES 2
DavidP
17 - Castor
17 - Castor

Hi @hellyars 

 

Have a look if below workflow helps you out.

 

DavidP_0-1597708745101.png

 

hellyars
13 - Pulsar

@DavidP  That is highly useful. Thank you.

Labels