Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Parsing table from a URL

BonusCup
10 - Fireball

Hi, 

Sorry for the long post, wanted to try and give as much info as I could.  So, I am new at Alteryx and have been testing to do jobs that I've used other tools to do.  I'm trying to pull the table data off one of our company's internal sites and running into issues.  Unfortunately, since it is an internal site I am unable to share the link but I did edit it to only 11 rows for help here.  I found the attached NFL workflow here that I used as a guide but I am getting stuck at the RegEx.  The table I'm trying to pull has around 8,000 rows but when I use the RegEx from the NFL workflow it only returns around 3,000 rows.  I'm able to finish the workflow and save the data off as I need it but comparing the data shows the row count discrepancy.  I am not experienced in using RegEx but I believe this is where the problem is occurring.  Can someone please help with this?  -TIA

 

Workflow screenshot:

parseURL_Capture.JPG

 

RegEx from NFL workflow:

<[!fiohldpMNmcsbtua][^>]*>|<\/[^t][^>]*>|<\/title>|<\/table>|&[a-z]+;

 

Input after Data Cleansing:

<HTML xmlns:java="http://xml.apache.org/xalan/java"> <BODY> <Center> <H1>Example Performance Report</H1> <H2></H2> <H3>07/15/2019</H3> <TABLE border="1"> <TR> <TD bgcolor="CC9933" align="center"><B>Example Name</B></TD> <TD bgcolor="CC9933" align="center"><B>Connection</B></TD> <TD bgcolor="CC9933" align="center"><B>Machine Type</B></TD> <TD bgcolor="CC9933" align="center"><B>TestID</B></TD> <TD bgcolor="CC9933" align="center"><B>Number</B></TD> <TD bgcolor="CC9933" align="center"><B>Released Number</B></TD> <TD bgcolor="CC9933" align="center"><B>Signal Type</B></TD> <TD bgcolor="CC9933" align="center"><B>Affiliation</B></TD> <TD bgcolor="CC9933" align="center"><B>Test Name</B></TD> <TD bgcolor="CC9933" align="center"><B>Test Rank</B></TD> <TD bgcolor="CC9933" align="center"><B>aTest Perc</B></TD> <TD bgcolor="CC9933" align="center"><B>vTest Perc</B></TD> <TD bgcolor="CC9933" align="center"><B>asTest Perc</B></TD> <TD bgcolor="CC9933" align="center"><B>Highest Perc</B></TD> </TR> <TR> <TD bgcolor="FFCC66" align="Center" nowrap="1">TestCh1 CH</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">umi-s0324-t01-m1-au04-d01-ch01-sc00</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">NG</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">47314</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">5681</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">Y</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">Analog</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">&nbsp;</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">NEW YORK</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">1</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">99.39</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">0.00</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">99.44</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">99.44</TD> </TR> <TR> <TD bgcolor="FFCC66" align="Center" nowrap="1">TestCh2 CH</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">umi-s0720-t01-m1-au08-d02-ch01-sc00</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">NG</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">37561</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">5675</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">Y</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">Analog</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">&nbsp;</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">NEW YORK</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">1</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">98.14</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">0.00</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">98.33</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">98.33</TD> </TR> <TR> <TD bgcolor="FFCC66" align="Center" nowrap="1">TestCh3 CH</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">umi-s1486-t01-m1-au01-d02-ch01-sc00</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">NG</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">37537</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">5676</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">Y</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">Analog</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">&nbsp;</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">NEW YORK</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">1</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">91.87</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">0.00</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">93.31</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">93.31</TD> </TR> <TR> <TD bgcolor="FFCC66" align="Center" nowrap="1">TestCh4</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">umi-s0224-t01-m1-au02-d02-ch01-sc00</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">NG</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">18541</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">&nbsp;</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">&nbsp;</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">Analog</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">&nbsp;</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">NEW YORK</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">1</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">0.00</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">0.00</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">92.63</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">92.63</TD> </TR> <TR> <TD bgcolor="FFCC66" align="Center" nowrap="1">TestCh5</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">umi-s0203-t01-m1-au08-d02-ch01-sc00</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">NG</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">15380</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">2141</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">Y</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">Analog</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">&nbsp;</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">NEW YORK</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">1</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">76.23</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">0.00</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">100.00</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">100.00</TD> </TR> <TR> <TD bgcolor="FFCC66" align="Center" nowrap="1">TestCh6</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">umi-s3097-t01-m1-au01-d02-ch01-sc00</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">NG</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">25604</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">&nbsp;</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">&nbsp;</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">Analog</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">&nbsp;</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">NEW YORK</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">1</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">0.00</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">0.00</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">100.00</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">100.00</TD> </TR> <TR> <TD bgcolor="FFCC66" align="Center" nowrap="1">TestCh7</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">umi-s0203-t01-m1-au03-d02-ch01-sc00</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">NG</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">19085</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">2765</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">Y</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">Analog</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">&nbsp;</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">NEW YORK</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">1</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">100.00</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">0.00</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">100.00</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">100.00</TD> </TR> <TR> <TD bgcolor="FFCC66" align="Center" nowrap="1">TestCh8</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">umi-s0700-t01-m1-au01-d02-ch01-sc00</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">NG</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">19086</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">2771</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">Y</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">Analog</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">&nbsp;</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">NEW YORK</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">1</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">100.00</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">0.00</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">100.00</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">100.00</TD> </TR> <TR> <TD bgcolor="FFCC66" align="Center" nowrap="1">TestCh9</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">umi-s0039-t01-m1-au03-d02-ch01-sc00</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">NG</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">11287</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">2766</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">Y</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">Analog</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">&nbsp;</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">NEW YORK</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">1</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">100.00</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">0.00</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">100.00</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">100.00</TD> </TR> <TR> <TD bgcolor="FFCC66" align="Center" nowrap="1">TestCh10</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">umi-s0720-t01-m1-au08-d01-ch01-sc00</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">NG</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">32565</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">4819</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">Y</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">Analog</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">&nbsp;</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">NEW YORK</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">1</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">94.75</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">0.00</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">100.00</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">100.00</TD> </TR> <TR> <TD bgcolor="FFCC66" align="Center" nowrap="1">TestCh11</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">umi-s0224-t01-m1-au05-d02-ch01-sc00</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">NG</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">12746</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">2242</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">Y</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">Analog</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">&nbsp;</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">NEW YORK</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">1</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">100.00</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">0.00</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">100.00</TD> <TD bgcolor="FFCC66" align="Center" nowrap="1">100.00</TD> </TR> </TABLE> </Center> </BODY> </HTML>

3 REPLIES 3
MichalM
Alteryx
Alteryx

@BonusCup 

 

You don't necessarily need to use RegEx if you're not familiar with it. What you'll need to do is identify patterns in the data. This will help you to identify how to extract the bits that you care about.

 

parse-table.png

 

Step 1

Split your HTML into individual rows using the Text-to-Columns tool in Split to rows mode and use \n as a delimiter

 

spli-to-rows.png

 

Step 2

Identify patterns

 

Table headers

The rows with Table headers are one below a row with background colour identified using value "CC9933". Use Multi-Row formula to flag these up. Here's more on how to use it.

 

Table values

All rows with background colour identified using value "FFCC66". Use the Filter tool to get these identified. 

 

Once parsed out, you can bring them back together just like I do in the attached example.

BonusCup
10 - Fireball

@MichalM 

 

Thanks so much for your help.  With your guide I was able to get all the data into one column like the screenshot:

parseURL_Capture2.JPG

 

I needed to create columns and rows for every 14 rows so I found a solution here, Transpose row data to columns.  This gave me exactly what I was looking for.

 

parseURL_Capture3.JPG

 

Thanks again for your help!

MichalM
Alteryx
Alteryx
Perfect! Glad to hear.
Labels