Hi have one dataset with 83k rows and the other with 32K. first dataset columns "name" "correct name" second dataset "nameds" and "correct name" no duplicates in each one. I execute Union and the result is just 82K rows. ( so lees them intial)
I understand union as a below all data add is it correct?
thanks
JP
Solved! Go to Solution.
Hi @juan sánchez? That's because whatever you see on browser is a sample aka Representative set of rows from the raw dataset. Take a Quick scan random sample after the Union and you will find random rows appear from both datasets. Definitely worth a read would be this documentation on Sampling https://docs.trifacta.com/display/SS/Overview+of+Sampling.
Best
Vardan
thanks a lot Vardan ;)