This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
I have 2 datasets with same column headers. I want to do a lookup and remove the data with common values under 1 column ( Say Column A). Please confirm what can be the best approach. Appreciate, if you can show with some dummy examples.
My suggestion is to load the 2 datasets with separate input data tools. Add an identifier column to each with value "dataset 1" and "dataset 2". Then Union the 2 together, use a unique tool to remove duplicates (make sure to exclude the identifier column) and split them back to 2 data sets with a filter tool on the Identifier column.
This method will keep the 1st instance found of a set of duplicates and remove the rest.
If, for instance, you wanted to remove the dups from both data sets. you can take the D output of the Unique tool and join it back to the original data sets 1 and 2 and take the left (or right) output of the joins.