Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Knowledge Base

Definitive answers from Designer Desktop experts.

Automating File Transfer Protocol (FTP) Downloads

MattD
Alteryx Alumni (Retired)
Created

So we’re now downloading all the network-shared documents we want thanks to instructions posted on our Knowledge Base, and we’re on our way to masteringFTP in Alteryx. But what if we want to take it a step further? A lot of our users rely on FTP as a drop zone for datasets that are generated periodically (e.g. weekly, monthly, or quarterly datasets). We should then be able to schedule a workflow to coincide with those updates, automatically select the most recent dataset, crank out all the sweet data blending and analytics we have in our scheduled workflow, and proceed with the rest of our lives, right? Right. We can do just that, and with a little work up front, you can automate your FTP download and analysis to run while you’re enjoying the finer things in life. Here’s how in v10.1:


Say the file you want to download is ftp://ftp.url.here/FTP Example File.xlsx :


1.png


Each time period, this file is listed at the URL ftp://ftp.url.here/ :


2.png


If we use this URL in our Download Tool and specify in the Basic tab to output to a string field (a), otherwise keeping our configuration the same, most servers will return all the files listed at that URL (b).


(a) Basic Tab Configuration


3.png


(b) Response from Download Tool


4.png


While the most recent version of the file will usually be recognized by the latest date and/or time in the first two fields (we’ll parse those out, too), we’re going to go ahead and use the date stamps in the file names since that’s more difficult and we’re hardcore. Depending on the formats here the approach will vary, but we’ll use regex in this case. First, let’s split the single response cell we have into multiple rows using the Text To Columns Tool:


texttocolumns.png

The file list will now look like the following:


6.png


Now let’s unleash the power of regex:


regex1.PNG


After the parsing above, the file list will look like the below:


8.png

Just apply one more regular expression to get the date stamps out of the file names:

regex2.PNG


Which looks like:


10.png


Thenwe’ll throw in a little logic to use either a posted date or a file name date stamp (both converted to date types for sorting later), if provided:


formula1.PNG


Now that we’ve isolated the dates we’ll leverage to determine the file to open, we can simply sortdescending by the Date field, sample the first record, and create a new URL for our FTP request using the file name corresponding to our date of choice:


formula2.png


Once that’s done, all we need to do is follow the FTP instructions we had before and schedule the workflow to run just after the files are dropped!

Comments
NJT
11 - Bolide

Awesome, thanks for sharing this is exactly what I was looking for now if I can just get the external vendors SFTP and our firewall and the Altyerx software all working together I'll be in business!

jcardoso
8 - Asteroid

Hi,

i was following the process (more or less) and it works, but what i really want is to create like a macro that updates the input parameters (host, user, pass, folderPath) of an FTP transfer connector, given before in different columns, and outputs the list of files in those ftps. The idea is to create like a mapping for the input parameters, where the user, later, on the macro itself, can choose from which columns the parameters are fed.
This works when i'm passing the URL (host+folderPath) before the macro and updating only the user and password, but not working when taking host+folderPath dinamically. 
Some help would be very much appreciated, as i'm unable to attach a workflow, so i post the macro image.
All the control parameters actions are updating the expression value only (default).
The formula tool is to create the URL string with the updated inputs, (host+folderPath) before inserting into the FTP connector, but i guess is this part which is not working as a batch macro.

auto_ftp_macro_.jpg

 

 

Raghu_s
8 - Asteroid

Hi @MattD

 

This concept looks cool. I have a use case that is little different, would like to know some benchmarks or best practices that could help. 

 

1. What if my input files vary 20 - 40 Gigs in size which are at different share drive aka networks (not hosted FTP locations) ?  

2. The process is slow to read such huge file which cannot be broken down. 

3. This usually happens when I pick files from a distant server. (overseas - physically servers are at a different country within organization limits)

4. What would be an alternate solution to handle such files as we read such universe files frequently which gets updated? So moving every time to local is again slow.   

 

So ,

1. Does download tool save time while doing such operations? 

2. Do they allow such huge files that resides in an FTP sever to be read fast? 

3. Do you suggest to move such files from shared network to FTP so this can ease the process? 

 

eagerly waiting for some answers and advice.

 

Thanks in advance. 

 

harsha384
8 - Asteroid

hi Guys,

 

Thank you for the information. Its really useful. I have another situation where i download file from FTP locations, and i have sorted the list of files based on date. Now i want to upload the sorted files on the shared drive. How should I achieve it? I tried using download tool, but invain. Any quick help?

 

Thanks

Harsh

scottgrossnickle
5 - Atom

Hi,

 

I am having an issue with the download tool. I have followed the instructions, and even tested the URL to ensure that it works, but when the tool goes to actually download the files from the FTP I get the following:

 

Any idea how to fix this issue?

 

FTP 302 Error.jpg

Ferrari
7 - Meteor

Hi ,

In the text input I inserted a second column where I indicated the path to save the data; after which I entered the "filename from a field" in the basic third banner window by selecting the column with the specified path. But when I send the workflow to run, I get the following error message:
  "Error: Download (1): Error Opening file: C: \ Users \ Documents \ download: Access denied. (5)".

I modified the path and inserted the name of the files that should download but the zip file that is generated is empty, and then reading the result in the browser tool known this:

"DownloadHeaders"

"HTTP / 1.1 401 Unauthorized"


Thank you in advance for availability

Paulo1300
8 - Asteroid

how do you get the FTP download to show the full file upload date? Mine never shows the year only the month, day, and time. I have tried with multiple FTP servers, and all of them show the year in the filezilla but no year in alteryx.

Paulo1300
8 - Asteroid

@MattD any thoughts on my comment above? In your screenshot, it shows your date as "02-08-16". Notice in my screenshot below it shows the date like so:

 

Capture.PNG

 

How did you get your date to show the year in the download tool? I have tried multiple FTP servers for clients (ours and theirs) and they all appear like this in alteryx.

Tankitaz
7 - Meteor

Hi @MattD , thanks for your answers, it helps a lot. Then I got a question, once I get the full path like this: 

sftp://sftpxxxx.com/xxx/123xxxxabcdefg.XML, I got a file looked like this, all the data were crowded into one column names "DownloadData",  then how can I download this file to the alteryx memory? or how can I download this xml file wtih the same format?

Thanks.

 

Capture.PNG

Paulo1300
8 - Asteroid

If you want it in alteryx memory, you need to parse the data somehow. Typical tools to do this include tokenize, text to column, etc.

If you prefer to output the file and then use it in a different workflow you can output using this configuration in the download tool:

Capture.PNG

Tankitaz
7 - Meteor

Great, thanks a lot!

spenc215
5 - Atom

I've been able to download a file from an FTP as outlined here, but I am unable to output to a "string" as outlined in this Automated process.

 

In the Text Input tool that precedes the Download Tool, I enter the URL pointing to the FTP location (the particular file is not included in this URL Path)

In the Download Tool, I select the Output -> To a Field -> String as outlined above. 

 

This is where things get wonky.... the "Response from the Download Tool (b)" in my workflow is nothing like what the screenshot shows, where there are clear line items for each file that include a date and time stamp...

 

From the procedure outlined above

spenc215_0-1590605608139.png

 

From my workflow

spenc215_0-1590605962896.png

 

 

You can see that all the files in the FTP location are contained in the "DownloadData" cell, but they do not include a date/time stamp, and I'm unsure of how to parse this data to reference the particular file I'm interested in...

 

Any ideas? I'm trying to automate a daily pull of a report and conceptually this process seems like a good fit; just having a tough time getting going 😛

 

Paulo1300
8 - Asteroid

yeah i had the same concern when i started. every FTP site I have ever connected to looks like yours and I have always wished it looks like the screenshot posted in this thread. in the end what i have noticed is that if the data is not from the current year, it includes the year.. otherwise the data is from the current year.

 

Paulo1300_0-1590693521019.png

 

in regard to parsing it, i just do text to columns and break it by space. then i just select the columns i need and move on. the only time it is different is if you are pulling a file from a different year, so you can add some logic to account for that.

 

---------

 

in regard to your other issue, unless you are trying to pull the directory to know the name of the file (or whether it has been updated), i am not sure why you need to parse the FTP directory anyways. if you are trying to pull the data to a string, you need to include the file name in your URL. if the directory is being returned to you, then you didnt include the file name in the URL

 

Rahul3
8 - Asteroid

Hi @Paulo1300 

 

Do you get the solution for your problem. I am also getting the same response from ftp.

Paulo1300
8 - Asteroid

@Rahul3 no, i still use the process above (text to columns by space). then you just know that if there is a "time" in that one column, it is the current year, otherwise it is the year in that column

i basically do the following for the date (after doing a text to column tool based on space):


if contains([DownloadData8],":")
then [DownloadData6]+" "+[DownloadData7]+" "+left(datetimenow(),4)
else [DownloadData6]+" "+[DownloadData7]+" "+[DownloadData8]
endif

then i do a formula to convert the output of that to an alteryx date
DateTimeParse([File Date],"%B %d %YYYY")

 also one thing that doesnt work with this process is that if the file has spaces in it, then file name will be spread out into multiple columns.. the way i get around that is the text to columns has to = 9 and you set "leave extra in last column". this works since it seems like everyone i pull from is 9 columns and the last column is always file name