Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

URL Pull

willmccartney
5 - Atom

Hello - First time poster here and new to Alteryx. I am trying to create a workflow to accomplish a relatively simple task: Return the first five URLs for a given search term.

 

I have a list of companies that have been concatenated with their respective regions, in Excel. I'd like the Alteryx workflow to reference the concatenated value, and return the first five URLs from a Google search. Does anyone have an example workflow I could reference? Thank you! 

1 REPLY 1
T_Willins
14 - Magnetar
14 - Magnetar

Hi @willmccartney,

 

Welcome to the Alteryx Community!  This can be done using the download tool to scrape the data.  I word of warning, however.  What you are trying to do is not allowed by Google and potentially could get you (and your company) banned from accessing Google.  Websites have protocols limiting what data and sites can be accessed by robots (such as from and Alteryx workflow).  This can be seen by typing in the website address and adding /robots.txt to the URL.  See below for google.com/robots.txt

 

Google disallow.png

 

To see how web scraping can work, I have attached a workflow that pulls data from an imdb.com page.

 

Have fun on your Alteryx journey!

Labels