This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
So we’re now downloading all the network-shared documents we want thanks to instructions posted on our Knowledge Base, and we’re on our way to mastering FTP in Alteryx. But what if we want to take it a step further? A lot of our users rely on FTP as a drop zone for datasets that are generated periodically (e.g. weekly, monthly, or quarterly datasets). We should then be able to schedule a workflow to coincide with those updates, automatically select the most recent dataset, crank out all the sweet data blending and analytics we have in our scheduled workflow, and proceed with the rest of our lives, right? Right. We can do just that, and with a little work up front, you can automate your FTP download and analysis to run while you’re enjoying the finer things in life. Here’s how in v10.1:
One of the greatest strengths of modern web APIs is their flexible, developer-friendly nature, which provides numerous options for both the provider and the user. However, this flexibility can make it more intimidating for business users to deal with the various data formats that these APIs provide. The purpose of this article is to familiarize you with the main data formats used by the vast majority of web APIs, and provide the basic knowledge that will allow you to confidently process the data they return into a typical tabular format.
One of the biggest reasons why people love Alteryx is that it has the ability to read a very large number of different data sources. This article includes a workflow that is able to read in non-natively supported formats like a Word doc or pdf by using a open source program to convert these formats to plain text.
Web scraping, the process of extracting information (usually tabulated) from websites, is an extremely useful approach to still gather web-hosted data that isn’t supplied via APIs. In many cases, if the data you are looking for is stand-alone or captured completely on one page (no need for dynamic API queries), it is even faster than developing direct API connections to collect.
In a previous article , we've shown you how you can upload to FTP using the Download tool. With the release of Alteryx 10.5, the Download tool now supports uploading to SFTP . With this addition, we'll take the opportunity to show you some more examples of uploading data to SFTP/FTP and show you how seamless it can be.
API connections give access to many web-based applications, database systems, or programs by exposing objects or actions to a developer in an abstracted format that can easily be integrated into another program.
Binary Large OBject (BLOB) data types are often used to store images, audio, and other multimedia files/objects in a single, standardized, format for simplified database management - making them a frequent filetype in the Alteryx Designer. Fortunately, with the Blob Convert Tool , along with the Blob Input and Blob Output Developer Tools , working with BLOB objects is no more difficult than the file types they represent!
Believe it or not, data can be beautiful. Take your black and white data points and add some color to them in visuals with the suite of tools found in the Reporting Category https://help.alteryx.com/current/index.htm#Getting_Started/AllTools.htm#Report_Presentation_Tools ! If you’re looking to create reports, presentations, images, or simply output data with a bang, you can use the Render Tool https://help.alteryx.com/current/PortfolioComposerRender.htm paired with other Reporting Tools to create HTML files (*.html), Composer files (*.pcxml), PDF documents (*.pdf), RTF documents (*.rtf), Word documents (*.docx), Excel documents (*.xlsx), MHTML files (*.mht), Power Point presentations (*.pptx), PNG images (*.html), and even Zip files (*.zip) – packed with formatting and visual aesthetic that’ll make any data-geek’s mouth water.
If you haven’t used the Run Command Tool just yet, that’s great. It means that whatever your analyses required, we had it covered with basic Designer functionality. But in spite of how great the Designer is, it just can’t do everything. There is a utility on your computer that can do just about anything, however, and it’s the command line . The Run Command Tool pairs the two into a dynamic tag-team duo that can wrestle all the computation you could need into one, integrated, Designer workflow:
Question Does Alteryx support web crawling?
Yes. In Alteryx you can look at a web page, find embedded links (e.g. using regular expressions), and add to a queue of "links to visit". Then continue visiting/adding indefinitely, while also extracting various other tidbits of interest from each page visited.
In a Text Input Tool, enter URLs to crawl. Alteryx can take the URLs from a data stream (a database where we have all of the URLs we want to crawl) and iteratively repeat the process of connecting and getting the code beneath that URL:
Use the Download Tool and point it to a web address:
Alteryx returns the whole content available for that URL:
The attached v10.0 workflow allows you to connect to wikipedia and "crawl" the content of that URL. It can be saved, parsed etc. Additional functionality may be added to create a very powerful crawling engine.
Question Can you wait X seconds between processing each row in Alteryx?
Yes! Thanks to Invisio for creating an Inviso Macro Pack and posting on their blog here.
The "Wait a Second" macro lets you wait X number of seconds before processing each row in the dataset.
One application is if you are contacting an API with multiple requests. The WaitAsecond macro may help to pause the API long enough to process multiple rows without issue.
It can also be used to scrape sites without putting heavy loads on their server. An Invisio sample of scraping the Alteryx community (See Insights to the Alteryx Community)
As you can see, the part of the flow that runs through the WaitASecond tool gets a NOW timestamps which are 5 seconds a part, whereas the bottom stream, not running through the WaitASecond tool, all gets the same timestamp.
There are essentially two macros:
The first one assigns a unique id to each record and then uses that ID for the batch macro.
The batch macro has a “Command tool” that runs a ping that waits x seconds before timeout (18.104.22.168 if that exist in your network it won’t work).
The macro can be downloaded here (InvisoMacros.zip).
To do your best data blending, it is a critical need to have the flexibility to connect to as many data stores as possible. No puzzle reveals a complete picture without all the pieces in place, and the same adage holds true in analytics. While we’re proud to boast a list of supported input file formats and data platforms that may even be large enough for database storage itself, unfortunately, in the ever expanding world of data you just can’t catch them all. Enter the Download Tool . In addition to FTP access, this tool can web scrape or transfer data via API (check your data source – there’s almost always an API!), giving you access to even the most secluded data stores. With the examples compiled below, and the wealth of data accessible on the web, you can turn nearly any analytical puzzle into the Mona Lisa :
This article is part of the CS Macro Development Series. The goal of this series is to communicate tips, tricks, and the thought process that goes into developing good, dynamic macros.
Implementing APIs into macros isn’t a difficult process, you only need to first understand macros and how they’ll interact with your API requests. In short, a macro gives you the ability to encapsulate just about any repeatable process in an Alteryx workflow as a tool. Understanding that, you only need to identify what in your API request will need to change with each row the process/request is being repeated for, and how to update this request dynamically.
With each row of your input data stream, expect to be able to use fields to reference what individual values will be – doing so in a formula tool will build out parts of the request that change with each record. If instead you need to update an argument of the request just once for all your records, try using an interface tool and a place-holding value. Need to update parts of a request for only certain records? You can use formula logic or the batch macro’s control parameter approach.
Consider the Google Maps Geocoding API request format below:
If we were to send a request to their API to geocode a single address (specifying an xml output), this would look like:
To update this dynamically, within a macro, we need only to map our input fields to their appropriate places in the request, emulating the rest of the request syntax with formula logic:
(the replace function exchanges spaces for + characters, the remainder of the + characters are added as literal strings to mirror the format above)
Then only updating our key remains before passing this to a Download Tool, and this will be the same for all our input rows:
The v10.5 example above is attached for reference. It is an adaptation of a more robust Google Maps Geocoder hosted on our gallery.
Please note that in order to use this macro, you must first generate a server key from the Google Developers Console. Each key has a limit of 2,500 free requests per day. Click here for more information on usage limits for the Google Maps API.
This macro demonstrates requests to Google Maps' web service API and is meant as a proof of concept only. It is not intended for production use.
To go along with our example on how to download a file from FTP, we’ve assembled steps in v10.1 below (credentials, server removed) as an example of uploading a file to FTP. In this example (attached) I’ve encoded a string field as a Blob to be posted as a text file. Theoretically, all your fields could be concatenated to a CSV format, or another delimited format, to be converted and posted using the same steps:
My field string to be converted:
1. First identify the field to be converted to Blob in your Blob Convert Tool:
2. Specify in a Formula Tool your FTP URL and filename in the format URL/filename.extension:
3. Have your Download Tool use this field as the URL field in the Basic Tab:
4. In the Payload tab specify the HTTP action PUT and select the option “Take Query String/Body from Field” and specify your Blob field:
5. Specify your credentials in the Connection tab of the Download Tool, leave all other configuration options default:
6. Run the workflow!
After running, you should be able to confirm the successful transfer of your file in the DownloadHeader field returned from the Download Tool (it'll also be hosted on your FTP path):
Take a look at the results below:
.NET Framework is a software framework developed by Microsoft that supports the building and running of apps and XML web services. The framework version can have an impact on the installation and operation of Alteryx.
Tab 1 - Basic
URL : The URL for the resource you are trying to access must come from an upstream tool and is the only field required by Alteryx to configure the Download tool. Based on the API you are trying to pull information from (or send data to), other information will be required such as headers and/or a payload.
The error above is usually due to a company firewall restriction. If you are receiving the error above when installing, please download run the individual installers from this page: http://downloads.alteryx.com/downloads.html. You can choose admin or non-admin depending on the permissions on your machine.
If you are still having trouble downloading, don't hesitate to contact firstname.lastname@example.org.
If the API that you are working with requires you to sign or authenticate your requests, it may utilize an implementation of OAuth 2.0 or another authentication method to show that you have the access needed to consume the web service. There are some key words that you can look for in the API documentation that you are using that will help you quickly choose the appropriate grant flow to use in Alteryx.