Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
SOLVED

Regex tool giving Null results on Web scraping workflow

Highlighted

Hi,

 

My goal is to extract the data between the <td </td> tags on the website (workflow attached).

The thing is that the tables are stored on multiple pages.

Is there any Macro out there ready for this?

 

BR

Highlighted
Castor
Castor

Hi @Andre_Liboreiro 

 

There isn't a already existing macro for paging through web site results since each of them implements the pagination differently. 

 

In your particular case, the website designer didn't bother with post backs or anything fancy.  The entire table is downloaded in the initial response and the JavaScript just displays different data depending on the selected page. 

 

Since the data all available in the first response, it's just a question of parsing it out of the returned HTML 

 

wf.png

 

The top container gets the response, finds the table and marks the header and body sections using a couple of Multi-row tools.  The middle one parses the header info and cross tabs the results to put it all on one row.  The last one uses a couple of Multi-row tools to tag each row and the columns within each row.  After the final Cross tab,  the header and data rows are unioned together and the Dynamic Rename takes the column names from the header row. 

 

Here's the start of the data

 

start.png

 

and the end showing the most recent data for June 6th

 

end.png

 

You'll notice that I left the task of replacing the URL encoded characters in the column names for you.

 

Dan

Highlighted

Thanks Dan, I did not realize that all the data was already there.

 

BR

 

André

Labels