Alteryx Designer Desktop Ideas

SeanAdams · ‎05-14-2017

The download tool is currently a general purpose tool that is used for many different things; from downloading FTP files; to scraping websites.

However, as a general purpose tool, it cannot serve the specific need of scraping a website without doing a huge amount of work to get there. What makes Alteryx great is the fact that it drops the barrier so that regular folks can do some really powerful analytics, but the web scraping capabilities are not yet there and still require a tremendous amount of technical skill to accomplish.

I'll go through this from top to bottom:

Split capability: The download tool tries to be too many things to too many people. Break it up into its component parts - one for FTP; one for Web Scraping; etc - with deep speciality. You can still keep the download tool as the super-user version but by creating the specialized tools, we can make this much more user-friendly
Connection: For enterprise users, where there's a locked down connectivity to the internet - there is no way to scrape web content without using CURL. So we need the ability to connect to websites in a way that does not require curl or complex connectivity setups for users to navigate through web proxy settings.
- Alteryx could auto-detect settings by allowing the user to point to the site within a controlled browse form like Excel does
Parameters: Many websites explicitly support named parameters (using ? notation) - it would be very useful to allow the user to link to these parameters explicitly without having to do complex string conjugations or %20 scrubbing to get of non-URL friendly characters
Content: Alteryx presents the user with no native ability to process HTML, so all scrubbing to pull out a specific field has to be done through complex read-through of the underlying source of the website (delivered in "DownloadedData") followed by guessing on patterns on how the site does tables or spans etc, followed by complex regex.
- Instead, we could present the user with a view of the web-page and ask them to select the elements that they want
- This would serve the dual purpose of making this user-friendly for regular folks and abstract away the technicalities; but also would allow the download tool to eliminate all the other bits of the page that are not wanted like scripts; interstitial adverts; images; headers & footers etc.
Improved post / parse capability: Sometimes the purpose of a URL is to generate a download (like the Google Finance API) - again, would be good to observe the user using the target site to record & interpret what they are looking for and what they get (e.g. the file from google)
HTML & XML types: why not an explicit type in Alteryx for web content?
Finally - HTML aware. The browse tools are not currently HTML aware, so all the useful formatting to be able to see what's going on, expand nodes, find patterns etc - all this has to be copied out of Alteryx into Notepad ++. Given the ubiquity of HTML parsers and pretty printers and editors, it should be reasonably easy to get a cheap component that can provide this capability

SeanAdams · ‎06-07-2018

Hey @TashaA

This is the web scraping discussion that we talked through at Inspire. It woudl be VERY useful if we could pull web-scraping into a brand new tool which is a specific website connector that has the rich web-scraping functionality of Excel.

andyuttley · ‎01-15-2020

I love this idea!

Might also be good to get some additional HTML parsing functionality alongside this (similar to Python's BeautifulSoup package); I know this can be recreated manually, but would be great to have out the box

cgoodman3 · ‎07-19-2020

It would be great to have something as simple as the importhtml function in google sheets for scraping tables from websites. This could either be a function in the formula tool or within the download tool (more sensible?).

When they demonstrated copy and paste from a website into the text input tool, I assumed this is what it would do.

AlteryxCommunityTeam · ‎08-31-2022

Alteryx Designer Desktop Ideas

Submitting an Idea?

Improvement to Download Tool to allow for Web Scraping