Hi all,
I've got 2 files that i'm trying to cross check.
File #1 contains a large amount of data, inclusive of rows of data (identified via a Unique ID) that are irrelevant for my purposes.
File #2 contains a list of all relevant unique ID's that i'd like to keep (alternatively i also have file #3 that has all irrelevant Unique IDs)
What i'm trying to do is cross check File #1 against File #2 and delete all rows in File #1 with Unique Id's not within File #2
e.g.
Before
File #1
Unique ID | Description | Amount |
123.XYZ | test | 2 |
223.ABC | test 1 | 10 |
345.123 | test 2 | 5 |
File #2
Unique ID |
223.ABC |
345.123 |
After
File #1
Unique ID | Description | Amont |
223.ABC | test 1 | 10 |
345.123 | test 2 | 5 |
Résolu ! Accéder à la solution.
If you connect the two files using the join tool, joining on the unique ID, then center output should be what you are looking for - it'll be everything from file #1 (the left input) that had a match found in file #2 (the right input).
Hi Both,
sorry, I may have misrepresented the ID portion of my question.
The actual ID i'm filtering on is an account number (rather than a unique line number)
as such there may be 100,000 lines associated to this account number that I need deleted.
When I run the join workflow comparing my 2 files, the output is only 2,000 lines long as it is taking the first 2,000 instances of this account filter, rather than 1,000,000 that I expect.
Given that the Unique id (per file #2) is not unique to each line in File #1, is there a way to delete all rows in file 1 with that account number?
e.g.
Before
File #1
Line ID | Account ID | Description | Amount |
123.XYZ | 123.100 | test | 2 |
223.ABC | 123.200 | test 1 | 10 |
345.123 | 123.200 | test 2 | 5 |
File #2
Unique Account ID |
123.200 |
After
File #1
Line ID | Account ID | Description | Amount |
223.ABC | 123.200 | test 1 | 10 |
345.123 | 123.200 | test 2 | 5 |
I think I misunderstood what you were initially trying to do, however, I still believe joining the two files on the account ID is the way to go. Assuming you connect file #1 to the left input, and file #2 to the right, the three outputs should give you the following:
Left output: all account IDs from file #1 that did not have a matching account ID in file #2
Join (center) output: all account IDs that appear in both file #1 and file #2
Right output: all account ID's from file #2 that did not have a matching account ID in file #1
That said, I think the left output is what you are looking for. Anything from file #1 that has an account ID contained in file #2 will be sent to the Join (center) output, leaving only file #1 account IDs without a match in the left output. If this is not the case, let me know, and we can try to troubleshoot the problem further.
Solved!
realised it was actually a problem with my File #2 that was causing my problem
thanks so much!