Web Scraping Javascript in Alteryx
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Good morning, I'm trying to scrape a table with changing data in Alteryx from the following site: https://www.adr.com/drprofile/66987V109
I just want the table with the following data:
DRs Outstanding 201,978,216
DR Market Cap 21.23B USD
Underlying Shares Outstanding 2,068,264,000
Company Market Cap 217.44B USD
DR% of Company Market Cap 9.77%
When I try using the download tool the output says, "!doctype html><html lang="en"><head><meta charset="utf-8"/><meta name="viewport" content="width=device-width,initial-scale=1,shrink-to-fit=no"/><meta http-equiv="X-UA-Compatible" content="IE=edge"/><meta name="theme-color" content="#000000"/><link rel="manifest" href="/manifest.json"><link rel="shortcut icon" href="/favicon.ico"><title>J.P. Morgan's adr.com | The premier site for the global investor</title><script defer="defer" src="/static/js/main.8b0acd94.js"></script></head><body><noscript>You need to enable JavaScript to run this app.</noscript><div id="app"></div><script src="/globals.js"></script></body></html>"
Would anyone be able to tell me if it's possible to use the download tool to get this portion of the table without error?
Solved! Go to Solution.
- Labels:
- Developer
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @JDG0711
I tried Alteryx's Download tool to no avail as the JP Morgan page has dynamic rendering and the Dowload tool don't get them, then I tried using Python, and it failed because the data to download (screenshot below) is a table without header 😲 and Pandas does not handle header-less tables or my knowledge is limited.
Arnaldo
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @JDG0711,
Sparksun has the right solution here - but I thought I would add a bit further information.
They have captured the API request from the page itself to get at the data you're looking for. You can do this yourself by using your browser's inspect function and looking at the network requests.
Regards,
Ben
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @JDG0711
Indeed the workflow posted by @sparksun bypass the constraints I comment earlier by submitting your request via a third party service; I googled for that service without finding any comment, review, feedback; You should be aware that what I believe your account number is exposed to a third party service.
Arnaldo
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thanks Sparksun this worked!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thanks Ben, I'm familiar with the "inspect" function but regarding the network request, where do you see the actual API request when you go in there.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
This screenshots may help:
- First click on the Network tab.
- Then click on each Path on that screen, until you see the information you are chasing on the response pane.
Thanks to @sparksun and @Ben_H we learned something new today !!! and my aplogies for the panic attack earlier regarding the source of the URL to use.
First click on the Network tab
click on each Path on that screen
Arnaldo
