Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Import, Filter and Combine Large Data Excel files (.xlsx)

aba
7 - Meteor

Hi All,

 

Really hoping I can get some help from anyone here.

 

Essentially, I've got a request to build a workflow that goes into folders categorised on a monthly basis picks up some files, combines them and generates an output for review. The challenge is each file is very large (on average 60-100MB and over 700k lines) and we're looking to open each file (circa 60 in total), filter on the line items by "Product" for example, combine the output then filter on the next file. I've created a macro that can read each file and carry out the process but my challenge is something always seems to break and the time it takes to run the workflow is really long. My question is, is there a way to compress the file, or read it in as a .yxdb even though its a .xlsx file and then carry out the other processes afterwards? 

 

Appreciate the help!

 

5 REPLIES 5
jdminton
12 - Quasar

Excel files have to be read in through the Excel input due to the encoding in the .xlsx file itself. You can create a macro to read in each file and export to .csv or .yxdb (still separate) and then use your other macro to filter and combine. That option may speed it up some. I'd be interested if you've successfully used the macro on a single file and how the results timed for the test. I've run into similar issues and it was usually a bad join that was duplicating information. I'm not saying that's happening here, but there may be something similar..

apathetichell
18 - Pollux

@abadepending upon the ram on your system you are running on - there shouldn't be a major limitation of in memory processing of 60 700k line excel files. That's around 42mm rows - and I can 100% tell you that Alteryx can (memory and processor dependent) process that much data. If you are seeing an issue and you are running on a 32gb or larger system - you should look into troubleshooting your process. I would also make sure that all excel files are local vs network locations because network locations could experience timeout locations mid workflow.

 

I would strongly recommend looking into a piecemeal process with intermediate storage (ie local .yxdb/DB/etc) if the files are in a network location.

aba
7 - Meteor

Thanks a lot @jdminton! That strategy seems to be the only way around it although I was hoping there might be a different way as the process will be run by users via Alteryx Server and ideally would've wanted something more convenient. The macro itself works just fine and the data output from the macro is as it should be. It's just a lot of files and lots of lines. Appreciate the help.

aba
7 - Meteor

@apathetichell working with a 16GB RAM at the moment so that might be the issue. Definitely going with the piecemeal strategy as it seems to work fine. Appreciate the help!

jdminton
12 - Quasar

Glad it helped!

 

I haven’t used server much, but if you can use CReW macros, you can set the workflows up to run sequentially using the runner and conditional runner tools. That would at least reduce to one step. I think you can also chain the apps to run sequentially in server. https://community.alteryx.com/t5/Alteryx-Designer-Desktop-Knowledge-Base/CReW-Macro-Support/ta-p/465...

Labels