Alteryx Designer Desktop Discussions

SagarGite · ‎03-05-2022

Hi Everyone,

In one file I have 20k rows and in another file, I have 50k rows.

I am looking to join data of file one of reach entry with file two's each row.

can anyone help, how I can do this?

Initially, I created a common key column to match data with each other like in the key column I populated "1" in all rows of both files.

Then joined data by join tool with Key criteria but it took hours to run and showed memory issue.

Is there any way to manage joins with huge data?

apathetichell · ‎03-05-2022

when you join you join each matching entry of each of the two sources. What that means is that if you try to join 20,000 records on 50,000 records each on the value of 1- your resulting dataset would be 20,000*50,000 records. How do you want this to work? usually a key column is a unique value.

SagarGite · ‎03-06-2022

Yes this is what I am trying.

atcodedog05 · ‎03-06-2022

Hi @SagarGite

You can also use append tool with below config for cross join and use below config for handling large data.

Cross join will take time check whether this method is faster.

Hope this helps : )

atcodedog05 · ‎03-06-2022

-

danilang · ‎03-06-2022

Hi @SagarGite

Either the join using the 1 as a key or the Append fields tools will work, but the outcome of this is 1,000,000,000 rows, which is an very data set. Just appending 2 two tables with 1 integer column each took almost a minute(37 sec in AMP mode) on my 6 core Ryzen with 32GB RAM. With real data, 10-20 fields on both sides, this could take a very long time and eat up all your RAM. Once your RAM is full your system will start swapping to disk, a slower operation even with an SSD.

To minimize the intermediate RAM use, try the following.

1. Use Select tools to reduce the number fields that you're joining. Only include the fields that you need to complete your operation. Avoid including large string fields, like descriptions, etc, if possible.

2. Create a batch macro that will join and process your records using only a group of records from the first side to all the records on the second. Set your batch size to something like 200 rows to start. The process will take a while to loop through all the batches, but each batch will only use a small amount of memory.

Dan

Alteryx_AR · ‎03-06-2022

@SagarGite Use append tool to get the required result.

Please visit page Append Fields Tool | Alteryx Help for configuration details.

Hope this helps :)

Thanks

atcodedog05 · ‎03-10-2022

Happy to help : ) @SagarGite

Cheers and have a nice day!

Alteryx Designer Desktop Discussions

Joining Data

Re: Blending Data with Joins

Rearranging joined data

Join Data

Join two data set like VLOOKUP

Consolidate and Vlookup (join) data

Re: Need to check if we can activate container bas...

Re: Regex Line Extraction

Re: Rolling 12 months dates

Alteryx 2024.2 Upgrade Issue – Formula Tool Config...

Re: Multi-Row formula and elseif operators