Hi,
I would like to know why after I joined 2 datasets, the result will be increased in the records.
For example, below is the workflow:
the select tool has 1,972,634 records
the dataset postcode_states has 53,441 records
but after i joined both of the dataset, i get 250,812,652
do anyone know the answer and how to solve it? From my understanding, the value should remain same or decreased according how many the records joined.
Thank you for your time.
Hi @faiqz
The possible issue your join key has duplicates in both the data sets hence its causing many to many join.
Example
You would need to make key unique in at least one dataset to prevent data explosion.
This a video I could get on the topic
https://www.youtube.com/watch?v=qLMwRxKDhxQ
Hopefully, someone can pitch in and help you understand this much deeper.
Hope this helps : )