Is there record number limitation in web scraping?
Start Date | End Date | Record # |
1/1/2016 | 8/25/2016 | 223 |
8/1/2016 | 8/25/2016 | 240 |
I scraped data from Crimson Hexagon. I obtained smaller # of records (223) in shorter period (1/1/2016 - 8/25/2016).
Does anyone know why that happens and how to approach this issue?
Thank you,
Kazumi
Solved! Go to Solution.
If there is a limitation, I doubt it is on the Alteryx side. I don't know much about Crimson Hexagon, but I've been able to scrape many more records than that from other sites.
What I've run into in the past that limited my scraping was the number of API calls/http requests per minute/hour/etc was restricted by the host. In that case, you just need to drop a throttle too in and limit the number of requests per unit of time.
Do you have a workflow and/or output log that you can share?
Hi Patrick,
Thank you for your response. I figured out why it happened. Crimson Hexagon's API returns only 500 posts per call. Therefore, it is not an Alteryx's issue.
Sincerely,
Kazumi