Hi,
I'm trying to join 2 different files based on some condition .
This is my 1st file data
Fund | Security |
1A | 89236THM0 |
2B | 05253JAU5 |
3C | USD |
This is my 2nd file data
Fund | ISIN | Cusip | BBSEC |
1A | 49456BAM3 | 89236THM0 | 89236THM0 |
2B | 05253JAU5 | 28404Hk424 | 290424244 |
3C | USD2404JL | 20880USD839 | 2040848 |
I've used 3 Join tool
1) Fund=Fund and Security = ISIN
2) Fund= Fund and Security= Cusip
3) Fund= Fund and Security= BBSEC
Now the problem is some value in present in both of security is present in both cusip and bbsec . So same record is coming in 2 join output which is actually duplicate. Like Security= 89236THM0 is present in Cusip & BBsec of second file.
How to handle
Hi @AnandKumar1
You can unique tool to remove duplicates. If rows are exact duplicate you can select all columns as key columns.
https://help.alteryx.com/20212/designer/unique-tool
Hi @AnandKumar1 ,
Can you share a sample workflow till union tool so that we can see the join tool configuration and build/suggest accorindginly.
This is my problem statement
I've many files in my directory which i'm trying to join based on a condition. I've some files in one folder which has basic structure like below.
Some files has only Sec_Id or isin or sip. Some files has all 3 fields.
Fund | cusip | isin | bbsec |
123 | 637ABC | US4971VA | 744573AP1 |
124 | US2130 | US27CM6 | 040WAW |
567 | IN3133 | US185BA8 | 894AK5 |
890 | 13TKJG | US56VAM2 | 040WW5 |
Then i've few files in different folders which has structure like below
Fund | Security |
123 | 744573AP1 |
124 | US2130 |
567 | US185BA8 |
890 | 040WW5 |
So i wanted to join these files based on a condition 1) that if the Security of 2nd files exist in any of cusip or isin or bbsec from 1st files.
Some security values are present in cusip and isin both. Some are present in all 3 . But it will be there in at least one of these 3 fields.
Now i've applied 2 solution first with find -replace which is working fine for single file but when i use multiple files i'm getting lot of duplicate
2nd solution is joins but this is also giving almost double records.
Hi @AnandKumar1
Just a possibility can you check whether the duplicating is happening when you are reading the file.
@atcodedog05 Duplicates are happending only after Join.
Actually there are some records which are common in many files. I think a cartesian join is happening here
Hi @AnandKumar1
Then clearing duplicates before head and using find & replace approach after it should help in your usecase🙂
User | Count |
---|---|
19 | |
15 | |
13 | |
9 | |
8 |