Alteryx Designer Desktop Discussions

nwilder · ‎09-29-2021

Hi there,

I am trying to use the Join tool in order to combine multiple Excel files that result in over 1 Million rows of Payroll data. This seems to overwhelm the server and/or take forever to run. Is there a way to condense the data or make this possible in some way?

Thanks for your time.

Luke_C · ‎09-29-2021

Hi @nwilder

Can you provide some sample data - both input and output? How many records are your inputs? If you're not expecting 1M+ records after the join you may need to join on more fields to eliminate any cross joins that are happening.

nwilder · ‎09-29-2021

Attached is a small sample of the input data. My goal is to combine 5 files, each containing about 300,000 unique lines of this data and then output files containing the data from each unique Ledger Account (Column G).

Luke_C · ‎09-29-2021

Thanks, what is the reason for the joining you mentioned? It sounds like you would want to union the files together and then maybe use a summarize tool to group by Ledger Account and sum the debit/credit amount?

apathetichell · ‎09-29-2021

is this on Sever or Designer? Honestly this should be doable in either so there's probably an issue with your underlying workflow logic. I can't diagnose more than that without a sample of it.

Time · ‎09-16-2022

honestly, i haven't seen a good answer to joining a large table to a small dataset when i isn't possible to limit the records from the large table prior to the join in Alteryx. Suggest to use SAS for joining different environments or big data

apathetichell · ‎09-16-2022

@Time- this really depends upon your skillset with SAS vs Alteryx and institutional support. I do this all of the time in Alteryx so I don't have a problem. You/your company may have a great SAS team so this could be much easier in SAS. I'm sure both platforms can provide this capability.

Time · ‎09-16-2022

Hey sounds good, so it's possible to join a small row dataset to a million row dataset. Could you explain at a high level what tools or process you use in Alteryx when it isn't possible to limit the large table dataset prior to the join in Alteryx?

apathetichell · ‎09-16-2022

First off - I don't see a million row dataset as a large dataset. I see hundreds of millions and greater as a large dataset. So if it's 1 million I just use standard tools and don't really have an issue unless I'm dealing with expansive joins. Then obviously I have to do some grouping & prep beforehand. If it's 9 figures or more - then hopefully the data is in a DB and I use the In-DB tools. I create dynamic where clauses from the smaller dataset and use dynamic input in-db. 8 figures is king of the grey area. depends upon the data.

so basically I do some form of summarize tool/dynamic input in-db and then I can extract what I need using datastream out. Note - if you want you could also create a temporary table on your DB using datastream in and then use a Join In-DB. I tend to use the dynamic where clause instead.

Time · ‎09-16-2022

Great info thanks, i just found the dynamic input join tool will allow a in variable after asking our resident expert here. So, you just need to connect the "smaller" dataset to the dynamic input join tool and update the where clause to include an in statement of the key field.

Alteryx Designer Desktop Discussions

Join Tool - Very Large Data Set