I am trying to download multi-page corona related date from the UK gov web site (pagination section of developer guide at: https://coronavirus.data.gov.uk/details/developers-guide#params-page )
The attached macro seems to work properly on the first pass/page, and i am able to parse the data into sensible columns for further analysis and vizzing. But the macro does not seem to iterate over subsequent page calls, so i am only getting local areas that start with A or B, so far....
Suspect 1: I think the most likely problem is something in the configuration of the macro via the interface designer.
Suspect 2: I am also unsure how to construct the original URL text in the .yxmd file, so that it contains a reference to possible multiple pages..
Suspect 3: I had some trouble with constructing the URL for the calls beyond page 1 due to incomprehensible (to me) truncation of the string, so i have used a kludge to create a new field and then rename it back to URL. I think this is ok now, but could be part of the problem i guess.
For the moment, i am just trying to get the current cases data, but once i can get multiple pages retrieving reliably for that dataset , i will expand to other columns. The use case is for predicting school absences and comparing anomalies with attendance data to identify anomalies in attendance patterns.
Thank you for any insight you may be able to provide.
Solved! Go to Solution.
Hi @dfurlow ,
For me the macro appears to run fine, so you can get from page 1 to page 2 and download the data fine (10k records in each page, the output of macro gives 20k so 2 pages).
The macro breaks when trying to get to page 3, when I get a 500 server error.
Is it the same for you as well, or have you noted another issue?
Hi @dfurlow
First of all i got to say this is a great project.
Here is my solution. Got to say i cheated a little bit. Mine is kind of a hacky way.
So after reading the documentation and tried to pass page number in the request. But seems like the positioning was off. And after taking a look into your macro to see how exactly was it calling new page and whats the url that was used call next page. I got this
https://api.coronavirus.data.gov.uk/v1/data?filters=areaType=utla&structure=%7B%22date%22:%22date%22%2C%22areaName%22:%22areaName%22%2C%22areaCode%22:%22areaCode%22%2C%22newCasesByPublishDate%22:%22newCasesByPublishDate%22%7D&format=json&page=2
This immediately got the positioning so i just added &page=1 to the end of the request
https://api.coronavirus.data.gov.uk/v1/data?filters=areaType=utla&structure={"date":"date","areaName":"areaName","areaCode":"areaCode","newCasesByPublishDate":"newCasesByPublishDate"}&page=1
And just be on a safer side i checked in the browser and i got it this
and just checked for few more pages by changing the page number.
Here is my workflow:
I just to fetch 50 pages just in case. And there was 28 pages with data from 29 there was no data.
And there was no failures in between and this workflow took 2.5 mins to complete.
I would highly suggest you to store the download data into a file and then process it.
Hope this helps 🙂 Let me know how it goes
If this post helps you please mark it as solution. And give a like if you dont mind 😀👍
All the best with your project 🙂 and thanks for introducing this link to me 😀
Thank you for the feedback. It seems to be running a third iteration but then fails at some subsequent iteration. Maybe i am not understanding how to break the macro loop? The gov server delivers a JSON field called "pagination.next"... i am detecting that and using that to increment the page number in the call.
So it seems to have <started> the third iteration... but then when the macro terminates, the text message on the last line of the above screenshot reverts to "2 iteration have been run"...
@atcodedog05 points out that there are 28 or 29 pages to retrieve, which sounds about right, based on the historic data sets i previously was downloading as csv's before they improved the range of data available and moved it to an API instead of a straight file download... So i'm wondering about how to terminate the macro... i will try to review the alteryx documentation and check web presentations and see if i can find this info. Thank you @AngelosPachis and @atcodedog05 for looking into this. Once i get this working i will share it... the government seems to be providing Python, JS and R code (in due course), but a lot of the screenshots on their doc pages seem to be clipped so i can't actually see the code they are documenting.... ironic, really.
If the server has no more pages, the pagination.next will return null and that's when the macro should in theory, terminate. In practice though, maybe i need to insert a switch somewhere to terminate the macro.
Hi @dfurlow
I would suggest single workflow instead of iterative macro reason being iterative macro fetches one request at a time but directly loading multiple endpoints fetch multiple at a time and is more faster. And the workflow I have provided is fetching the data you just need to parse. That would be my suggestion. But yes do look into it yourself and see which one is faster and let me know.
From my experience with with APIs documentation usually they provide documentation in js(web apps), python(program/webapps), java (android apps) integration.
You can actually use the API in any language provided you know how to configure the endpoint.
Hope is helps 🙂
All the best, looking to hear back from you about the progress.
Kudos to @atcodedog05 and @AngelosPachis for being so helpful. I was able to adapt the suggestions from @atcodedog05 to obtain a full result set. This solution is , as described by @atcodedog05 "hacky" because it will require future maintenance if/when the dataset grows beyond the current 50 pages or returns manually specified in the workflow. @AngelosPachis generous debugging showed me that the iterative macro was actually iterating, but i was not able to determine how to advance beyond the second iteration. In theory, the iterative approach should be self-maintaining, since the covid data site's API provides a null value for next token when the total number of pages have been downloaded, but since i couldn't get past whatever is happening in the second or third iteration, and since the UK government site is evolving and therefore may require further update/maintenance in any case, i decided to slightly modify @atcodedog05 's solution to obtain a useful data set. I can now go lobby the site to supply more fields, and in particular, the school attendance data i am after. A big thank you to you both for your help and hope this thread may prove useful to other COVID data researchers who may be using Alteryx.
Happy to help 🙂 @dfurlow
Cheers and Happy Analyzing 😀
Feel free to reach out if you face any issues 🙂