In case you missed the announcement: The Alteryx One Fall Release is here! Learn more about the new features and capabilities here
ACT NOW: The Alteryx team will be retiring support for Community account recovery and Community email-change requests after December 31, 2025. Make sure to check your account preferences in my.alteryx.com to make sure you have filled out your security questions. Learn more here
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Parse all table row tags <TR>.*</TR> as rows

hellyars
13 - Pulsar

 

I have a minified HTML source.  85,000 table records are trapped in one giant paragraph.  The file is 112MB.  

 

GOAL = to extract each table row as a single row.  In crude terms, I want (<tr.*?</tr>).  That is, I want a row for each opening <tr> and closing </tr> tag with everything in between.

 

I tried (\<tr.*?\<\/tr\>), but this did not work.  

 

 

<tr><td><table><tr><td align="right" valign="top"><b>Number:</b></td><td><span>TACO ITEMS</span></td></tr><tr><td align="right" valign="top"><b>Organization ID:</b></td><td><span>abcd</span></td></tr><tr><td align="right" valign="top"><b>Name:</b></td><td><span>MORE TACOS PLS</span></td></tr><tr><td align="right" valign="top"><b>TPP Create Date:</b></td><td><span>2020-03-16 08:53:55 EDT</span></td></tr><tr><td align="right" valign="top"><b>Last MENU ITEMS:</b></td><td></td></tr></table></td></tr><tr><td><div class="jstablecontainer"><table class="layoutTable"><tbody><tr><td><div class="frame_outer"><div class="frame"><span><div class="frameTitle" summary="null"><table class="layoutTable100"><tbody><tr><td class="title" nowrap="">MENU Structure</td><td class="objCount" nowrap="">(82,104 objects)</td></tr></tbody></table></div><div class="frameContent" style="width: 100%;"><table border="1" cellpadding="1" cellspacing="1" class="tablecellsepbg frameTable"><thead><tr><th class="tablecolumnheaderbg" nowrap="" scope="col"><span class="tablecolumnheaderfont">Number</span></th><th class="tablecolumnheaderbg" nowrap="" scope="col"><span class="tablecolumnheaderfont">Name</span></th><th class="tablecolumnheaderbg" nowrap="" scope="col"><span class="tablecolumnheaderfont">UPC Code</span></th><th class="tablecolumnheaderbg" nowrap="" scope="col"><span class="tablecolumnheaderfont">Version</span></th><th class="tablecolumnheaderbg" nowrap="" scope="col"><span class="tablecolumnheaderfont">Context</span></th><th class="tablecolumnheaderbg" nowrap="" scope="col"><span class="tablecolumnheaderfont">QUANTITY</span></th><th class="tablecolumnheaderbg" nowrap="" scope="col"><span class="tablecolumnheaderfont">UOM</span></th></tr></thead><tbody class="tablebody" id="tb__netmarkets.wp.wpcontent">

<tr class="o"><td class="c tabledatacell" nowrap=""><span>TACO ITEMS</span></td><td class="c tabledatacell" nowrap=""><span>TACO FAMILY</span></td><td class="c tabledatacell" nowrap=""><span>abcd</span></td><td class="c tabledatacell" nowrap=""><span>X0.3</span></td><td class="c tabledatacell" nowrap=""><span>BURRIOT</span></td><td class="c tabledatacell" nowrap=""><span></span></td><td class="c tabledatacell" nowrap=""><span></span></td></tr>
<tr class="e"><td class="c tabledatacell" nowrap=""><img border="0" hspace="0" src="data&colon;image/gif;base64,R0lGODlhEAAQAIABALCypf///yH5BAEAAAEALAAAAAAQABAAQAIdjB+Ay+D/WJJU0XZxRnab7oGbmJGXWXkHKrEaUwAAOw==" vspace="0"><span>abcd2472</span></td><td class="c tabledatacell" nowrap=""><span>KIT LIST</span></td><td class="c tabledatacell" nowrap=""><span>abcd</span></td><td class="c tabledatacell" nowrap=""><span>C.5</span></td><td class="c tabledatacell" nowrap=""><span>TACO</span></td><td class="c tabledatacell" nowrap=""><span>1.0</span></td><td class="c tabledatacell" nowrap=""><span>each</span></td></tr><tr class="o"><td class="c tabledatacell" nowrap=""><img border="0" hspace="0" src="data&colon;image/gif;base64,R0lGODlhEAAQAIAAALCypS5KbCH5BAEAAAEALAAAAAAQABAAAAIbjB+Ay8qf4HMS0Wou1pVLD4ETZpGH2JiZGj0FADs=" vspace="0"><img border="0" hspace="0" src="data&colon;image/gif;base64,R0lGODlhEAAQAIABALCypf///yH5BAEAAAEALAAAAAAQABAAQAIdjB+Ay+D/WJJU0XZxRnab7oGbmJGXWXkHKrEaUwAAOw==" vspace="0"><span>57K8780-001</span></td><td class="c tabledatacell" nowrap=""><span>BK</span></td><td class="c tabledatacell" nowrap=""><span>abcd</span></td><td class="c tabledatacell" nowrap=""><span>-.5</span></td><td class="c tabledatacell" nowrap=""><span>FISH</span></td><td class="c tabledatacell" nowrap=""><span>1.0</span></td><td class="c tabledatacell" nowrap=""><span>each</span></td></tr><tr class="e"><td class="c tabledatacell" nowrap=""><img border="0" hspace="0" src="data&colon;image/gif;base64,R0lGODlhEAAQAIAAALCypS5KbCH5BAEAAAEALAAAAAAQABAAAAIbjB+Ay8qf4HMS0Wou1pVLD4ETZpGH2JiZGj0FADs=" vspace="0"><img border="0" hspace="0" src="data&colon;image/gif;base64,R0lGODlhEAAQAIAAALCypS5KbCH5BAEAAAEALAAAAAAQABAAAAIbjB+Ay8qf4HMS0Wou1pVLD4ETZpGH2JiZGj0FADs=" vspace="0"><img border="0" hspace="0" src="data&colon;image/gif;base64,R0lGODlhEAAQAIABALCypf///yH5BAEAAAEALAAAAAAQABAAQAIdjB+Ay+D/WJJU0XZxRnab7oGbmJGXWXkHKrEaUwAAOw==" vspace="0"><span>12414308-018</span></td><td class="c tabledatacell" nowrap=""><span>ONIONS</span></td><td class="c tabledatacell" nowrap=""><span>abcd</span></td><td class="c tabledatacell" nowrap=""><span>D.4</span></td><td class="c tabledatacell" nowrap=""><span>NERD</span></td><td class="c tabledatacell" nowrap=""><span>7.0</span></td><td class="c tabledatacell" nowrap=""><span>each</span></td></tr><tr class="o"><td class="c tabledatacell" nowrap=""><img border="0" hspace="0" src="data&colon;image/gif;base64,R0lGODlhEAAQAIAAALCypS5KbCH5BAEAAAEALAAAAAAQABAAAAIbjB+Ay8qf4HMS0Wou1pVLD4ETZpGH2JiZGj0FADs=" vspace="0"><img border="0" hspace="0" src="data&colon;image/gif;base64,R0lGODlhEAAQAIAAALCypS5KbCH5BAEAAAEALAAAAAAQABAAAAIbjB+Ay8qf4HMS0Wou1pVLD4ETZpGH2JiZGj0FADs=" vspace="0"><img border="0" hspace="0" src="data&colon;image/gif;base64,R0lGODlhEAAQAIAAALCypS5KbCH5BAEAAAEALAAAAAAQABAAAAIbjB+Ay8qf4HMS0Wou1pVLD4ETZpGH2JiZGj0FADs=" vspace="0"><img border="0" hspace="0" src="data&colon;image/gif;base64,R0lGODlhEAAQAIAAALCypS5KbCH5BAEAAAEALAAAAAAQABAAAAIZjB+Ay8qf4HMS0Wou1pVLAIYhRpbmiaZmAQA7" vspace="0"><span>124247324</span></td><td class="c tabledatacell" nowrap=""><span>CHEESES</span></td><td class="c tabledatacell" nowrap=""><span>abcd</span></td><td class="c tabledatacell" nowrap=""><span>A.3</span></td><td class="c tabledatacell" nowrap=""><span>CERVELO</span></td><td class="c tabledatacell" nowrap=""><span>1.0</span></td><td class="c tabledatacell" nowrap=""><span>each</span></td></tr><tr class="e"><td class="c tabledatacell" nowrap=""><img border="0" hspace="0" src="data&colon;image/gif;base64,R0lGODlhEAAQAIAAALCypS5KbCH5BAEAAAEALAAAAAAQABAAAAIbjB+Ay8qf4HMS0Wou1pVLD4ETZpGH2JiZGj0FADs=" vspace="0"><img border="0" hspace="0" src="data&colon;image/gif;base64,R0lGODlhEAAQAIAAALCypS5KbCH5BAEAAAEALAAAAAAQABAAAAIbjB+Ay8qf4HMS0Wou1pVLD4ETZpGH2JiZGj0FADs=" vspace="0"><img border="0" hspace="0" src="data&colon;image/gif;base64,R0lGODlhEAAQAIABALCypf///yH5BAEAAAEALAAAAAAQABAAQAIdjB+Ay+D/WJJU0XZxRnab7oGbmJGXWXkHKrEaUwAAOw==" vspace="0"><span>abcd2317</span></td><td class="c tabledatacell" nowrap=“"><span>SRAME ETAP,</span></td><td class="c tabledatacell" nowrap=""><span>abcd</span></td><td class="c tabledatacell" nowrap=""><span>B.4</span></td><td class="c tabledatacell" nowrap=""><span>TACO</span></td><td class="c tabledatacell" nowrap=""><span>6.0</span></td><td class="c tabledatacell" nowrap=""><span>each</span></td></tr><tr class="o"><td class="c tabledatacell" nowrap=""><img border="0" hspace="0" src="data&colon;image/gif;base64,R0lGODlhEAAQAIAAALCypS5KbCH5BAEAAAEALAAAAAAQABAAAAIbjB+Ay8qf4HMS0Wou1pVLD4ETZpGH2JiZGj0FADs=" vspace="0"><img border="0" hspace="0" src="data&colon;image/gif;base64,R0lGODlhEAAQAIAAALCypS5KbCH5BAEAAAEALAAAAAAQABAAAAIbjB+Ay8qf4HMS0Wou1pVLD4ETZpGH2JiZGj0FADs=" vspace="0"><img border="0" hspace="0" src="data&colon;image/gif;base64,R0lGODlhEAAQAIABALCypf///yH5BAEAAAEALAAAAAAQABAAQAIdjB+Ay+D/WJJU0XZxRnab7oGbmJGXWXkHKrEaUwAAOw==" vspace="0"><span>abcd7223-002</span></td><td class="c tabledatacell" nowrap=""><span>LETTUCE</span></td><td class="c tabledatacell" nowrap=""><span>abcd</span></td><td class="c tabledatacell" nowrap=""><span>B.5</span></td><td class="c tabledatacell" nowrap=""><span>SAUCE</span></td><td class="c tabledatacell" nowrap=""><span>1.0</span></td><td class="c tabledatacell" nowrap=""><span>each</span></td></tr><tr class="e"><td class="c tabledatacell" nowrap=""><img border="0" hspace="0" src="data&colon;image/gif;base64,R0lGODlhEAAQAIAAALCypS5KbCH5BAEAAAEALAAAAAAQABAAAAIbjB+Ay8qf4HMS0Wou1pVLD4ETZpGH2JiZGj0FADs=" vspace="0"><img border="0" hspace="0" src="data&colon;image/gif;base64,R0lGODlhEAAQAIAAALCypS5KbCH5BAEAAAEALAAAAAAQABAAAAIbjB+Ay8qf4HMS0Wou1pVLD4ETZpGH2JiZGj0FADs=" vspace="0"><img border="0" hspace="0" src="data&colon;image/gif;base64,R0lGODlhEAAQAIAAALCypS5KbCH5BAEAAAEALAAAAAAQABAAAAIbjB+Ay8qf4HMS0Wou1pVLD4ETZpGH2JiZGj0FADs=" vspace="0"><img border="0" hspace="0" src="data&colon;image/gif;base64,R0lGODlhEAAQAIAAALCypS5KbCH5BAEAAAEALAAAAAAQABAAAAIZjB+Ay8qf4HMS0Wou1pVLAIYhRpbmiaZmAQA7" vspace="0"><span>5g354</span></td><td class="c tabledatacell" nowrap=""><span>SPICES</span></td><td class="c tabledatacell" nowrap=""><span>abcd</span></td><td class="c tabledatacell" nowrap=""><span>C.2</span></td><td class="c tabledatacell" nowrap=""><span>TOSTADA</span></td><td class="c tabledatacell" nowrap=""><span>0.0</span></td><td class="c tabledatacell" nowrap=""><span>as needed</span></td></tr>

 

2 REPLIES 2
DavidP
17 - Castor
17 - Castor

Hi @hellyars 

 

Have a look if below workflow helps you out.

 

DavidP_0-1597708745101.png

 

hellyars
13 - Pulsar

@DavidP  That is highly useful. Thank you.

Labels
Top Solution Authors