Bring your best ideas to the AI Use Case Contest! Enter to win 40 hours of expert engineering support and bring your vision to life using the powerful combination of Alteryx + AI. Learn more now, or go straight to the submission form.
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Use Alteryx to scrape a news site for stories on specific companies?

Henry_Gunn
6 - Meteoroid

Does anyone know if its possible to have Alteryx scrape a news site (provided that I have the logins) to return the url(s) of every article on the page with a specific company name in it/its title? 

 

I am trying to automate a competitive analysis setup and would like all applicable news stories (based on title/specific words in the body of the article) to be returned into a csv doc, or even better into excel.

 

I am new to Alteryx, please let me know if anyone can help.

2 REPLIES 2
pedrodrfaria
13 - Pulsar

Hi @Henry_Gunn 

 

You can do the scrape by using the download tool and download the HTML code. In the background, the information is embedded into the HTML, so you then would have to parse it out. It can become really tricky, but this is how you would need to do.

 

Pedro.

cgoodman3
14 - Magnetar
14 - Magnetar

 As @pedrodrfaria mentions webscraping can be achieved using the download tool to return the underlying html. It is also possible to webscrape using python code.

 

While it is possible, some points to consider:

1) what are the T&C of the site you are scraping, many will have restrictions in relation to scraping. 
2) is there an API available, if so this would be the preferred option 

 

Chris
Check out my collaboration with fellow ACE Joshua Burkhow at AlterTricks.com
Labels
Top Solution Authors