Hi everyone,
I am new user of Alteryx, and I would like to scrap a HTML page with Alteryx.
I also began to learn about webscraping.
So, the website that I would like to scrap use a GET method request. I don't know how to use or to set this request.
I've used the download tools but I did not manage to collect the code of the page with informations.
Thank you for helping me to scrap a website.
Hello @ByranCarter12 ,
Have you checked this solution:
https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Web-Scrapping/td-p/604820
It is a good example of how to get the data using GET function.
Regards
Hi @ByranCarter12,
You can use the download tool to retrieve the source-code of a web-page.
Please find attached a sample workflow extracting the HTML code of this specific Community thread.
It also helps to set the headers section in the Download Tool. You can use Chrome's Inspect (Network>>All>>Headers) feature to get a given page's information.
@afv2688, Thank you for your answer. I have read the post and I tried to use the download Tool but It's seems to not working.
In fact, I need to specify on the website search criteria like location, brands... to give me information.
Usually when you do a search on a website, you get a URL like www.website_name.com/?country=UK&brands=whatever
But in my case, I don't get these informations in the URL. So I don't know if there is a script that executes these criteria in separate ways.
In any case, I don't get the result of the search in the HTML code but rather the search criteria for example: input type = "checkbox"
Here I do not know if I should create a specific request to integrate it when downloading the code. That's why I was asking if it possible to parameterize a request with the GET method.
Thank you.
My experience with web scrapes... You can do it in Alteryx, but it can be a bit cumbersome (and I had a few ACE's from one of Alteryx's partners help setup my Alteryx derived solution). UI Path can be much quicker. Plus, it exports to Excel. You can then use Alteryx for further analysis. But, without knowing your target URL it is a bit difficult to troubleshoot. That said, I'd trust @afv2688 first.