Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Webscraping missing data

Marp
7 - Meteor

Hi All

 

Need help on webscraping

i see some missing values(data) with Alteryx webscraping

Below table contains original source code and the Alteryx webscraping - shows missing the entire row value highlighted in Blue

 

Please Advise

 

Alteryx flow - 

 

 

Marp_0-1615348723688.png

Marp_1-1615348801234.png    

Marp_2-1615348849287.png

 

 

 

 

Web Url SourceAlteryx Webscraping
      <td class="rowLabel" style="width: 210px;">Project:</td><td class="rowLabel" style="width: 210px;"Project:</td>
      <td><td>
<span class="drop_hilite">August 2017</span><span class="drop_hilite">August 2017</span> <span class="add_hilite">December 2019</span>      </td>
      </td></tr>
      <td><tr>
<span class="add_hilite">December 2019</span><td class="rowLabel" style="width: 210px;">Status:</td>
      </td><td>
    </tr><span class="add_hilite">Active, </span>not  <span class="drop_hilite">yet </span> recruiting      </td>
    <tr></tr>
      <td class="rowLabel" style="width: 210px;">Status:</td><tr>
      <td><td class="rowLabel" style="width: 210px;"Start:</td>
Not <span class="drop_hilite">yet </span>recruiting<td>
      </td> 
      <td> 
<span class="add_hilite">Active, </span>not recruiting
      </td> 
</tr> 
    <tr> 
      <td class="rowLabel" style="width: 210px;"Start:</td>
1 REPLY 1
BrandonB
Alteryx
Alteryx

This webpage likely has dynamically generated content after the page load which would mean that a download tool would not grab all of the information. You may want to leverage the Python tool and Selenium to accomplish this if that is the case. Here is a helpful article that walks you through the process: https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Python-Code-Tool-Web-Scraping-Dynamic-... 

 

It will scrape the HTML just like the download tool will, but you can have it wait for the extra content to load. Additionally, Selenium is incredibly powerful and you can use it to click buttons and pass values into text boxes. I have used it for quite a few use cases where I needed to actually interact with a webpage beyond just scraping the HTML. 

 

As a final note, you may want to see if the webpage has an API available. This is always preferable over web scraping, because API calls are more resilient when it comes to data structure. You can imagine a scenario where someone changes the layout or content of a webpage and therefore causes your workflow to not find the same tags that you were using previously. Here is a helpful article on API calls: https://community.alteryx.com/t5/Alteryx-Designer-Knowledge-Base/APIs-in-Alteryx-cURL-and-Download-T... 

Labels