We have extended our Early Bird Tickets for Inspire 2023! Discounted pricing goes until February 24th. Save your spot!

Alteryx Designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer and Intelligence Suite.

Join Tool - Very Large Data Set

nwilder
6 - Meteoroid

Hi there,

 

I am trying to use the Join tool in order to combine multiple Excel files that result in over 1 Million rows of Payroll data.  This seems to overwhelm the server and/or take forever to run.  Is there a way to condense the data or make this possible in some way?

 

Thanks for your time.

10 REPLIES 10
Luke_C
17 - Castor

Hi @nwilder 

 

Can you provide some sample data - both input and output? How many records are your inputs? If you're not expecting 1M+ records after the join you may need to join on more fields to eliminate any cross joins that are happening.

nwilder
6 - Meteoroid

Attached is a small sample of the input data.  My goal is to combine 5 files, each containing about 300,000 unique lines of this data and then output files containing the data from each unique Ledger Account (Column G).  

Luke_C
17 - Castor

Thanks, what is the reason for the joining you mentioned? It sounds like you would want to union the files together and then maybe use a summarize tool to group by Ledger Account and sum the debit/credit amount?

apathetichell
17 - Castor

is this on Sever or Designer? Honestly this should be doable in either so there's probably an issue with your underlying workflow logic. I can't diagnose more than that without a sample of it.

Time
5 - Atom

honestly, i haven't seen a good answer to joining a large table to a small dataset when i isn't possible to limit the records from the large table prior to the join in Alteryx.  Suggest to use SAS for joining different environments or big data

apathetichell
17 - Castor

@Time- this really depends upon your skillset with SAS vs Alteryx and institutional support. I do this all of the time in Alteryx so I don't have a problem. You/your company may have a great SAS team so this could be much easier in SAS. I'm sure both platforms can provide this capability.

Time
5 - Atom

Hey sounds good, so it's possible to join a small row dataset to a million row dataset.  Could you explain at a high level what tools or process you use in Alteryx when it isn't possible to limit the large table dataset prior to the join in Alteryx?

 

apathetichell
17 - Castor

First off - I don't see a million row dataset as a large dataset. I see hundreds of millions and greater as a large dataset. So if it's 1 million I just use standard tools and don't really have an issue unless I'm dealing with expansive joins. Then obviously I have to do some grouping & prep beforehand. If it's 9 figures or more - then hopefully the data is in a DB and I use the In-DB tools. I create dynamic where clauses from the smaller dataset and use dynamic input in-db. 8 figures is king of the grey area. depends upon the data.

 

so basically I do some form of summarize tool/dynamic input in-db and then I can extract what I need using datastream out. Note - if you want you could also create a temporary table on your DB using datastream in and then use a Join In-DB. I tend to use the dynamic where clause instead.

 

 

Time
5 - Atom

Great info thanks, i just found the dynamic input join tool will allow a in variable after asking our resident expert here.  So, you just need to connect the "smaller" dataset to the dynamic input join tool and update the where clause to include an in statement of the key field.

Labels