Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Web Scraping results in truncated DownloadData

asyraf__razak
7 - Meteor

I am new and i am trying to extract the data from this webpage: https://ecomm.sirim.my/SirimEnquiry/search_model.aspx

The problem here is the webpage requires the user to input a specific search. Even after performing the search for example "samsung", the link of the webpage does not change. Can i directly copy from the search page itself? Because from here i will get truncated values for the downloaddata and the downloadheaders will prompt internal server error. 

 

To simplify:

1. Can i directly copy an URL from a webpage that contains search?(after searching a character, the link of the webpage remains the same)

2. How to prevent the download data from truncating?

3. How to ensure my downloadheaders are successful?

webpagewebpage

 Webpage after searching something.

link still unchanged after searching somethinglink still unchanged after searching something

 link after search does not change.

 

 

 

URL directly from the websiteURL directly from the website

 Inserted URL 

DownloadData shows truncated character and Download headers server errorDownloadData shows truncated character and Download headers server error

 Truncated characters in download data and error in download header

 

 

3 REPLIES 3
RishiK
Alteryx
Alteryx

Hi @asyraf__razak 

 

The issue here is we cannot get the actual web address back as a result of the Search.

 

Have a look at this article as it might help you to use ParseHub and then bring the data back into Alteryx.  Once you have your data back, the attached sample workflow which scrapes a static website, can  be used to extract the table you want.

danilang
19 - Altair
19 - Altair

Hi @asyraf__razak 

 

I think this was the article that @RishiK was referring to. 

 

Dan

RishiK
Alteryx
Alteryx

My bad thanks @danilang - I really just wanted you to post also (shout out to @OliverW)😉

Labels