Data file save and load slow on network drives
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Because of specific data sources which are available on network drives and in order to share data with colleagues I need to load data (especially Excel) from network drives and I have to save files to network drives. Unfortunately this is very, very slow.
If I compare the same file save on a local drive and on a network drive, network drive takes orders of magnitude longer than local. As a workaround, I have started to manually copy the input files from the network drive to my local harddrive, run the script and manually copy the results files to the network drive. With all the manual steps, this is still way faster than saving directly to the final destination.
I know that Excel import and export is slow, but the same happens (on a different scale) with Alteryx' native database format.
Also, I have observed that even saving the Alteryx scripts on a network drive takes very long in comparison to local save. (Gallery is not an option for me because it constantly asks my for PKI credentials.)
I have the impression that Alteryx has a fundamental problem with slow data connections/network drives.
If a direct speed up is not possible, why doesn't Alteryx just do what I do in the background: Copy the input files to a temporary directory, run the script, copy the output files back to the destination.
I would like to know if others have similar experiences and maybe workarounds/solutions.
(unfortunately, I cannot find any labels which fit to my issue.)
- Labels:
- Download
- Tips and Tricks
- Workflow
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I've noticed this as well, and I believe it's more of a product of the network/VPN. If you open the files in the network drive directly in Excel I'd expect they also take longer than a usual Excel on your local machine. I came to this conclusion because when using the same workflows on Alteryx Server (which is on the same network/server as the network drive where the files are) it ran much faster.
Unfortunately I don't have a suggestion though.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
File format: Excel xlsx
Spreadsheet size: 19 x 1048575 cells
File size: 69,990 KB
Runs: 3 to 4, except Alteryx network: 1
Alteryx script does only contain one Input Data tool.
Results (all times in MM:SS):
Task | Set up Input Data tool in Alteryx | Klick on existing tool in Alteryx | Run task: max time | Run task: average time | Run task: median |
Copy from network to local disk | N/A | N/A | 00:49 | 00:29 | 00:22 |
Open in Excel from local disk | N/A | N/A | 00:30 | 00:30 | 00:30 |
Open in Excel from network | N/A | N/A | 01:32 | 01:19 | 01:15 |
Open in Alteryx from local disk | 00:10 | 00:47 | 00:44 | 00:44 | |
Open in Alteryx from network | 19:39 | 16:02 | 41:35 | 41:35 | 41:35 |
If we compare the run times, following interessant information can be found:
- Loading time in Excel increases by factor 2.5 when opening from network drive in comparison to local disk.
- Loading time in Alteryx increases by factor 55 when opening from network drive in comparison to local disk.
- Opening in Alteryx from local disk is 1.5 times slower than in Excel.
- Opening in Alteryx from network drive is 31 times slower than in Excel.
Even if I could accept long running times when executing the process, I cannot accept that Alteryx needs so long to add the Input Data tool or to my workflow.
Let me try to interpret: If the data rate would slow down Alteryx, the slow down factor should be similar as in Excel (around 2 to 3). Thus, it needs to be a different cause.
I suspect that the data loading algorithm in Alteryx is just inferior for longer latency times. So I believe, Alteryx loads lots of small packages in a synchronised process rather than loading one big block of data at a time.
My conclusion is: The Alteryx team should look into the data loading and saving algorithm and change them in a way to load or save big blocks of data and to avoid synchrone data communication with the server.
Is there a way to make sure that this post reaches the Alteryx development team?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Support@alteryx.com is your best bet. I can certainly commiserate with what you're seeing, unfortunately I don't have a workaround but maybe someone else does.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
We have have the same issue with slow transfers to/fr network drives on VPN to server drives. This works for me. I copy source data to local drive and perform all analysis and files saves locally, then at end of workflow, I use a run command with ROBOCOPY.EXE to move the results to the network drive. ROBOCOPY works great for large files over a network.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
No issues at our end. All of our data is on NAS...even Sharepoint.
The file that you reference in this post...is Alteryx reading in 16384 or so fields/columns, mostly blank/null/empty?
I've experienced this. The source file's sheet was formatted from A1:XFD1048576.
If so, you will need to reset the last used cell. I've used vba to accomplish such.
