This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Yes. In Alteryx you can look at a web page, find embedded links (e.g. using regular expressions), and add to a queue of "links to visit". Then continue visiting/adding indefinitely, while also extracting various other tidbits of interest from each page visited.
In a Text Input Tool, enter URLs to crawl. Alteryx can take the URLs from a data stream (a database where we have all of the URLs we want to crawl) and iteratively repeat the process of connecting and getting the code beneath that URL:
Alteryx returns the whole content available for that URL:
The attached v10.0 workflow allows you to connect to wikipedia and "crawl" the content of that URL. It can be saved, parsed etc. Additional functionality may be added to create a very powerful crawling engine.