Hi Team,
Hope you are well.
i have problem. I am trying to join
1) City_1 with 2) Name in the country_city file.
Ideally the record should be 2641 as in the City_1. But its giving double the value.
can you please help?
thanks,
Solved! Go to Solution.
Hi @sriniprad08
The problem is because of the data set.
In the City file, take an example Amsterdam.
For the City Amsterdam in world cities, it has 2 values one with Country Netherlands and United States.
Hence single line item in city file, is joined with 2 items in world cities file leading to double the value of items from the cities file.
Many thanks
Shanker V
Hi @ShankerV ,
Thank you. But even if i select only the City_chk and Name and exclude all other fields. it still double the value.
Hi @sriniprad08
Taking a sample use case to explain the problem.
Dataset 1:
Dataset 2:
So if the dataset 2 has unique values for cities, then we can expect the same volume in the output post joining.
Many thanks
Shanker V
Selecting certain fields doesn’t change the number of rows that are causing the duplication - I’d use a Unique tool on the given fields (or perhaps some Group Bys in a Summarize) to get a unique list of values.
Hi @sriniprad08
Yes, its true. Because the join (vlookup) criteria is City.
In the below, these are just select tool criteria to deselect few columns which are not needed in output.
Hi @sriniprad08
As @alexnajm suggested for Unique tool, Unique tool takes the first value. In case of London, it will consider the 1st record and it will take the country as canada.
Hence as per the business requirement skip the duplicated rows in the dataset2 world cities file.
The root cause is because of the records 1:2 in dataset1:dataset2 respectively.
You can expect the same count of records in output if the data's are in 1:1 ratio.
Many thanks
Shanker V
Good call @ShankerV!