Alteryx Designer Desktop Discussions

Marp · ‎03-09-2021

Hi All

Need help on webscraping

i see some missing values(data) with Alteryx webscraping

Below table contains original source code and the Alteryx webscraping - shows missing the entire row value highlighted in Blue

Please Advise

Alteryx flow -

Web Url Source	Alteryx Webscraping
<td class="rowLabel" style="width: 210px;">Project:</td>	<td class="rowLabel" style="width: 210px;"Project:</td>
<td>	<td>
<span class="drop_hilite">August 2017</span>	<span class="drop_hilite">August 2017</span> <span class="add_hilite">December 2019</span> </td>
</td>	</tr>
<td>	<tr>
<span class="add_hilite">December 2019</span>	<td class="rowLabel" style="width: 210px;">Status:</td>
</td>	<td>
</tr>	<span class="add_hilite">Active, </span>not <span class="drop_hilite">yet </span> recruiting </td>
<tr>	</tr>
<td class="rowLabel" style="width: 210px;">Status:</td>	<tr>
<td>	<td class="rowLabel" style="width: 210px;"Start:</td>
Not <span class="drop_hilite">yet </span>recruiting	<td>
</td>
<td>
<span class="add_hilite">Active, </span>not recruiting
</td>
</tr>
<tr>
<td class="rowLabel" style="width: 210px;"Start:</td>

BrandonB · ‎03-09-2021

This webpage likely has dynamically generated content after the page load which would mean that a download tool would not grab all of the information. You may want to leverage the Python tool and Selenium to accomplish this if that is the case. Here is a helpful article that walks you through the process: https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Python-Code-Tool-Web-Scraping-Dynamic-...

It will scrape the HTML just like the download tool will, but you can have it wait for the extra content to load. Additionally, Selenium is incredibly powerful and you can use it to click buttons and pass values into text boxes. I have used it for quite a few use cases where I needed to actually interact with a webpage beyond just scraping the HTML.

As a final note, you may want to see if the webpage has an API available. This is always preferable over web scraping, because API calls are more resilient when it comes to data structure. You can imagine a scenario where someone changes the layout or content of a webpage and therefore causes your workflow to not find the same tags that you were using previously. Here is a helpful article on API calls: https://community.alteryx.com/t5/Alteryx-Designer-Knowledge-Base/APIs-in-Alteryx-cURL-and-Download-T...

Alteryx Designer Desktop Discussions

Webscraping missing data