Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Table from a URL with multiple tags and elements

BonusCup
10 - Fireball

Hi,

 

I'm working on a project where I'm trying to pull data from the table at https://thestreamable.com/markets/tampa-st-petersburg-sarasota-fl

 

The table shows what local tv stations are available in Tampa for each OTT provider.

 
 

tampaOTT.JPG

I've pulled tables before from websites using Text Input > Download > Text to Column > Multi-Row Formula but I can't seem to figure out this one.

 

tableDownload.JPG

Inspecting the page in Chrome I can see in the Elements where the table is and the different tags and elements but I'm not sure how to bring in the data.  If you hover over the green dot it actually shows the call letters of the station too.

 

Thanks for any help on this.

5 REPLIES 5
BrandonB
Alteryx
Alteryx

Not the most elegant workflow I have ever built, but it seems to get the job done that you want. Workflow is attached for reference!

 

HTML Parsing.png

 

  

KaneG
Alteryx Alumni (Retired)

Hi @BonusCup,

 

The tables on this page have a couple of things to consider:

  • Each table has multiple rows for header and the different rows have differing number of columns.
  • Also the channels column is tagged as headers
  • In the below screenshot, the red arrow points to column 1 which is in row2 of the header and row3 doesn't identify this column with a th/td tag.
    • The yellow arrow points to a table detail that is also tagged with th rather than td, and
    • The purple arrow points to where the 3rd row of the header splits into 2 columns.
  • These make it hard to tag with the Multi-Row like you normally would. However, the info is stored in rowspan="2" and colspan="2"

 

KaneG_0-1598924186651.png

 

So, the start is the same as I would always do, look for <table(.*?)</table> and then the <tr> tags for rows

KaneG_1-1598927309138.png

 

But then it gets more interesting with the Rowspan and Colspan. After idetifying the multiple column/row pieces, a mixture of multi-row and generate rows (as well as a cheeky trick to make sure the new line appears before the others) will get the order correct.

 

KaneG_2-1598927365736.png

 

After that it's just cleanup.

 

I've attached the workflow that these screenshots came from. Some of the column naming might be odd, but that's because I stole parts from a Table Parser Macro that I have (as it didn't deal with the rowspan etc properly)

GaneshBo
Alteryx
Alteryx

Hi @BonusCup ,

 

I'm not sure how much data you're looking to pull from the website but I hope this can help with the start.

 

Cheers,

Ganesh

BonusCup
10 - Fireball

@BrandonB 

 

Thanks so much for this!  It got me to exactly what I needed.  With a little bit of tweaking, I added an extra piece at the beginning of yours that finds all the URLs for all the markets from https://thestreamable.com/markets and then that flows into your example.

 

 

BonusCup_0-1598984793628.png

 

Thanks again!

 

BonusCup
10 - Fireball

@KaneG 

 

Thank you for the details for each area.  It was very informative.  I was able to use Brandon's but I'm going to dig into your workflow to look at alternatives for a better understanding.

 

Thanks

Labels