Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
SOLVED

Can't get Download tool to retrieve a 'blob' csv file on webpage but hidden from html

Highlighted
Asteroid

I'm having a hard time trying to download a csv file from a UK website. Here is the site, https://www.hesa.ac.uk/collection/c18051/a/domicile, and

here is an image of the link I want to download in Alteryx (circled in red):

 

hesa1.PNG

The url appears to resolve to blob:https://www.hesa.ac.uk/9bb84ca8-539b-498e-8b79-44e68cdc0382 so I'm not sure how that gets turned into a csv link in the browser. Also, each time the page refreshes the blob url changes. Finally, if you copy and paste the blob url into an incognito window you get an error. It's all very strange and I can't figure it out.

 

Moreover, when I view the html page source there is no trace of the words 'Download valid entries as csv' or the link. Is there some kind of JavaScript obfuscation going on here? Can anyone figure out how to download this csv within Alteryx? I know there are some 'Blob' tools but I can't figure out what to do with them. Thanks

Highlighted
Castor
Castor

Hi @jt_edin 

 

If you look here in the source

 

HTML.png

 

You'll find the URL. 

 

If I paste this "blob:https://www.hesa.ac.uk/dd8891ad-d4d2-404c-a20d-b10225f306d3" into chrome I get the table.  You have to include the "blob:" at the front or you get a 404

 

Dan

Highlighted
Asteroid

Thank you for trying @danilang but if it were that simple I think I might have got there myself :-)

 

I can see you're looking at the Inspect Element view in chrome, but in my initial question I already pointed out that the link in question is not contained in the html source code. It is this html code, after all, which would be downloaded by the Download tool in Alteryx. If you right-click on the page in Chrome and choose View page source you'll see what I mean:

 

html.PNG

 

I have highlighted the part of the html where the link should appear, but it appears to be obfuscated within the manual-scrollbar div class. Am I missing something? Does it appear when you click on view source?

 

And even if the link was present in the source code, how would you get Alteryx to turn the blob: link into a csv as the browser does. How does Chrome know what to do with that link? Thanks

Castor
Castor

Hi @jt_edin 

 

Sorry about the confusion earlier.  

 

I did some research on the blob: prefix and found this 

 

"Blob URI/URL was created by JavaScript, refers to data that your browser currently has in memory (only in current page), and does not refer to data the exists on the host."  (source)

 

Apparently this is something that is built from data already downloaded in the initial request, but you already pointed that out.  I have no idea what the javascript does to get the data into that final format, but since the data is already in the request it's just a question of teasing it out using Alteryx

 

Here's a workflow that does just that. 

 

WF.png

It's a hack and is highly dependent on the response, but it gives these results with the code in Country1 and the country name/description in Country2.  I'll leave the exercise of cleaning these up to you.

 

results.png 

 

 

Dan

Highlighted
Asteroid

Ah yes of course. In this case it's not so much that there's a CSV sitting on a server. Rather, the csv is created by the browser based on the information in the webpage in which case, as you suggest, the route to go down is to parse the html in the usual fashion. For some reason I had become fixated on finding the csv file rather than viewing it as the same thing as the table on the page. Thank you the insight, and for the helpful example. Every time I dabble with regex I get a little bit better, and then I forget it all!

Labels