community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx Knowledge Base

Definitive answers from Designer experts.

Web Crawling

Alteryx
Alteryx
Created on

Question

Does Alteryx support web crawling?

Answer

Yes. In Alteryx you can look at a web page, find embedded links (e.g. using regular expressions), and add to a queue of "links to visit". Then continue visiting/adding indefinitely, while also extracting various other tidbits of interest from each page visited.

 

In a Text Input Tool, enter URLs to crawl.  Alteryx can take the URLs from a data stream (a database where we have all of the URLs we want to crawl) and iteratively repeat the process of connecting and getting the code beneath that URL:

 

 crawl1.JPG

 

Use the Download Tool and point it to a web address:

 

crawl2.JPG

 

 Alteryx returns the whole content available for that URL:

 

crawl3.JPG

 

The attached v10.0 workflow allows you to connect to wikipedia and "crawl" the content of that URL. It can be saved, parsed etc. Additional functionality may be added to create a very powerful crawling engine.