Hi @Qiu,
I hope you are doing well.
I am trying to parse the HTML file. Could you please help me with that?
Thanks,
Solved! Go to Solution.
Hi @Simon1187
What are you trying to find in this file? From looking at it in Chrome, it's only script functions with no data. This looks like the output of an initial web call that then uses the urls embedded in it to fetch the actual data. The key URL here seems to be the call to HTTP://api.autopilothq.com/anywhere/... that contains the expired token. This is probably the next call in the chain. The data may come from this.
Edit: I just realized that you are the same person wanting to scrape the data from 4000 pages that I answered yesterday. From this first sample, it looks like you'll need a selenium based solution to prerender the complete pages so you can scrape the data from the final page.
The other alternative is to add download tools calling the embedded urls until you get to the final data you need. At each step look at the resulting html in chrome to help you find the possible links. That's how I found the autopilot link. Since all the pages that you're searching are created by the same team, the format of the pages and the number of embedded calls will probably be the same, so a single workflow will probably be able handle them all.
Dan
Hello @Simon1187
As you mentioned privately, you are trying to connect to a Cube.js site. Your best option is probably to connect via their API. This will bypass having to perform multiple web downloads and scaping. Once you get the a key to access the site, you'll be able to access the data directly. You can check this article for the basics on connecting to a REST API and this one for an introduction to authentication.
Dan