Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Web Scraping and Number of Record Limitation

knozawa
11 - Bolide

Is there record number limitation in web scraping?

 

Start DateEnd Date

Record #

1/1/20168/25/2016223
8/1/20168/25/2016240

 

I scraped data from Crimson Hexagon.  I obtained smaller # of records (223) in shorter period (1/1/2016 - 8/25/2016).

Does anyone know why that happens and how to approach this issue?

 

Thank you,

Kazumi

2 REPLIES 2
patrick_mcauliffe
14 - Magnetar
14 - Magnetar

If there is a limitation, I doubt it is on the Alteryx side.  I don't know much about Crimson Hexagon, but I've been able to scrape many more records than that from other sites.

What I've run into in the past that limited my scraping was the number of API calls/http requests per minute/hour/etc was restricted by the host.  In that case, you just need to drop a throttle too in and limit the number of requests per unit of time.

Do you have a workflow and/or output log that you can share?

 

knozawa
11 - Bolide

Hi Patrick,

 

Thank you for your response. I figured out why it happened.  Crimson Hexagon's API returns only 500 posts per call. Therefore, it is not an Alteryx's issue.

 

Sincerely,

Kazumi

Labels