Alteryx Designer Desktop Discussions

Gaurav3 · ‎12-26-2019

Hello,

I am trying to retrieve articles headlines and tags associated from website : https://www.cnbc.com/metals/.

How can I scrape the data ?

WilliamR · ‎12-26-2019

Hi @Gaurav3 ,

use the download tool for that purpose. You need to parse the HTML content to extract the desired data.

(If this post helps, then please consider it as the solution to help the other members find it more quickly).

GiuseppeC · ‎12-26-2019

Hi @Gaurav3,

in addition to what @WilliamR suggested, I noticed that the underlying HTML of the webpage that you posted comes in an unusual format, so I added some logic to give you an example of how to parse it and identify headlines.

See below and attached:

Hope this helps!

Giuseppe

fmvizcaino · ‎12-26-2019

Hi @Gaurav3 ,

One suggestion, non related to the workflow itself is to get data from rss feed page. It will be easier to get all the headlines.

https://www.cnbc.com/rss-feeds/

Best,

Fernando Vizcaino

Gaurav3 · ‎01-01-2020

Thank You! 🙂

Alteryx Designer Desktop Discussions

Web Scraping

Re: Row creation

Re: How to select columns dynamically using number...

Re: Batch macro to read 1000+ .xlsx files with var...

Re: Issue when using Block Until Done and Power BI...

Example workflow for setting up a custom list to u...