Hello Guys,
I'm trying to web scrape all the records from the website. But the problem is wherein the webpage only has a limited number of records that can be scrapped. using the download tool.
For eg:- In the first webpage, I have 30 records and can scrap only 30 records using Alteryx. But there are 2000+ records on the other webpage like. page 1=30 records, page 2=next 30 records, page 3= next 30 records and soon.
Is it possible to scrape all the 2000+ records from the website?
Thanks in Advance,
Venkatesh
Solved! Go to Solution.
Hi Venkatesh23,
Is the url always consistent and just has a different page number each time? For example, page 1 is www.exampleurl.com/page/1 and page 2 is www.exampleurl.com/page/2
If so, you can create a template url like www.exampleurl.com/page/, then use the Generate Rows tool to get a list of numbers for all the necessary pages, and add that number to the url so that you create a list of urls with all the pages. If you then parse through this list of urls to the download tool, you should be able to get all the data you need.
An alternative but similar way is to use an iterative macro to replace the page number for each iteration.
Let me know if you have any questions about this or if you want me to show you an example workflow.
Josh
Thank @JoshuaGostick
for the quick response.
Please, can you share me the sample example of the workflow?
Thanks,
Venkatesh.