Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
GELÖST

Joining Data

SagarGite
Meteor

Hi Everyone,

In one file I have 20k rows and in another file, I have 50k rows.

I am looking to join data of file one of reach entry with file two's each row.

can anyone help, how I can do this?

Initially, I created a common key column to match data with each other like in the key column I populated "1" in all rows of both files.

Then joined data by join tool with Key criteria but it took hours to run and showed memory issue.

Is there any way to manage joins with huge data? 

7 ANTWORTEN 7
apathetichell
Altair

when you join you join each matching entry of each of the two sources. What that means is that if you try to join 20,000 records on 50,000 records each on the value of 1- your resulting dataset would be 20,000*50,000 records. How do you want this to work? usually a key column is a unique value.

SagarGite
Meteor

Yes this is what I am trying.

atcodedog05
22 - Nova
22 - Nova

Hi @SagarGite 

 

You can also use append tool with below config for cross join and use below config for handling large data.

atcodedog05_1-1646560578989.png

 

Cross join will take time check whether this method is faster.

 

Hope this helps : )

 

atcodedog05
22 - Nova
22 - Nova

-

danilang
19 - Altair
19 - Altair

Hi @SagarGite 

 

Either the join using the 1 as a key or the Append fields tools will work, but the outcome of this is 1,000,000,000 rows, which is an very data set.  Just appending 2 two tables with 1 integer column each took almost a minute(37 sec in AMP mode) on my 6 core Ryzen with 32GB RAM.  With real data, 10-20 fields on both sides, this could take a very long time and eat up all your RAM.  Once your RAM is full your system will start swapping to disk, a slower operation even with an SSD.  

 

To minimize the intermediate RAM use, try the following. 

 

1. Use Select tools to reduce the number fields that you're joining.  Only include the fields that you need to complete your operation.  Avoid including large string fields, like descriptions, etc, if possible.

 

2. Create a batch macro that will join and process your records using only a group of records from the first side to all the records on the second.  Set your batch size to something like 200 rows to start.  The process will take a while to loop through all the batches, but each batch will only use a small amount of memory.

 

Dan    

Alteryx_AR
Quasar

@SagarGite  Use append tool to get the required result.

 

Please visit page Append Fields Tool | Alteryx Help for configuration details.

 

Hope this helps :)

 

Thanks 

 

atcodedog05
22 - Nova
22 - Nova

Happy to help : ) @SagarGite 

Cheers and have a nice day!

Beschriftungen
Top-Lösungs-Autoren