Joining Data
- RSS-Feed abonnieren
- Thema als neu kennzeichnen
- Thema als gelesen kennzeichnen
- Diesen Thema für aktuellen Benutzer floaten
- Lesezeichen
- Abonnieren
- Stummschalten
- Drucker-Anzeigeseite
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Moderator informieren
Hi Everyone,
In one file I have 20k rows and in another file, I have 50k rows.
I am looking to join data of file one of reach entry with file two's each row.
can anyone help, how I can do this?
Initially, I created a common key column to match data with each other like in the key column I populated "1" in all rows of both files.
Then joined data by join tool with Key criteria but it took hours to run and showed memory issue.
Is there any way to manage joins with huge data?
Gelöst! Gehe zu Lösung.
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Moderator informieren
when you join you join each matching entry of each of the two sources. What that means is that if you try to join 20,000 records on 50,000 records each on the value of 1- your resulting dataset would be 20,000*50,000 records. How do you want this to work? usually a key column is a unique value.
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Moderator informieren
Yes this is what I am trying.
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Moderator informieren
Hi @SagarGite
You can also use append tool with below config for cross join and use below config for handling large data.
Cross join will take time check whether this method is faster.
Hope this helps : )
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Moderator informieren
-
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Moderator informieren
Hi @SagarGite
Either the join using the 1 as a key or the Append fields tools will work, but the outcome of this is 1,000,000,000 rows, which is an very data set. Just appending 2 two tables with 1 integer column each took almost a minute(37 sec in AMP mode) on my 6 core Ryzen with 32GB RAM. With real data, 10-20 fields on both sides, this could take a very long time and eat up all your RAM. Once your RAM is full your system will start swapping to disk, a slower operation even with an SSD.
To minimize the intermediate RAM use, try the following.
1. Use Select tools to reduce the number fields that you're joining. Only include the fields that you need to complete your operation. Avoid including large string fields, like descriptions, etc, if possible.
2. Create a batch macro that will join and process your records using only a group of records from the first side to all the records on the second. Set your batch size to something like 200 rows to start. The process will take a while to loop through all the batches, but each batch will only use a small amount of memory.
Dan
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Moderator informieren
@SagarGite Use append tool to get the required result.
Please visit page Append Fields Tool | Alteryx Help for configuration details.
Hope this helps :)
Thanks
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Moderator informieren
Happy to help : ) @SagarGite
Cheers and have a nice day!