When I had to compare text files at a previous company, we'd use Beyond Compare to show only the lines that differed. It was smart enough to also skip empty lines so that the line number did not matter. It was really useful when trying to compare code. I haven't come across an clean way to do this in Alteryx.
My first thought was to join, but then I had an issue where one file had a replicated line in it and the join did not catch it since it would join with the first occurrence of the other file.
Any suggestions on this?
We'll need some data to see what you mean.
I agree with @caltang that we'll need to see more - also the Join will capture all matches, not just the first match, so maybe you tired a find Replace accidentally?
A simple example would be:
File 1 (four lines):
If this then that else something
The quick brown fox
jumps over the lazy dog
The quick brown fox
File 2 (two lines of code plus an empty line between):
The quick brown fox
jumps over the lazy dog
Desired Result:
If this then that else something
The quick brown fox
The result would show the differences, but the process knows that there's an extra line of 'The quick brown fox' in File 1 as well as knowing to skip the empty line in File 2.
The Join will capture both as you originally indicated in your question - if you want just one match, use a Find Replace
Please use your example in Alteryx and see what's wrong so we can pinpoint an answer better for you. You can also just remove your empty lines with a filter or a Data Cleansing tool if you are trying to Join by record position
Or if you just need unique matches, use a unique tool after your join - lots of solutions!
@AndyJay Alteryx would catch this, you technically might have to remove null or blank lines but using the join tool would catch it. My previous company also used Beyond Compare, however, I wouldn't even download it onto my machine in protest haha. Alteryx can definitely do the comparisons you need and more.
Agreed with @alexnajm, was a Find and Replace accidentally used?
Bacon
Thank you for the tip on removing empty lines. Find/replace was not accidentally used. What happened was both instances of 'The quick brown fox' from File 1 were matched with File 2. What I would like is to be able to see the left side, File 1, to show the difference on the left join anchor. It should have 'If this then that else something' and 'The quick brown fox' appearing once because the other occurrence of 'The quick brown fox' was matched with File 2:
If this then that else something
The quick brown fox
Right side (File 2) would then show nothing, since both lines were matched.
I'm thinking a record and sort tool may be the best way.
This is different than your original post, so that's where the confusion lies. "My first thought was to join, but then I had an issue where one file had a replicated line in it and the join did not catch it since it would join with the first occurrence of the other file." --> it does catch it, which is why we figured the find Replace was being used
the Join is your best bet in this case - you'll see what matched (and you can use a Unique after as myself and @abacon indicated to get one line of each) and you'll see what didn't match out of the L and R anchors
I knew join would be involved but I'm not sure unique would have helped. For example if 'The quick brown fox' appears 15 times on the left file and 20 on the right, I'd want 'The quick brown fox' to appear five times on the right and zero times on the left.
I should have posted true sample code that has lines being used several times. The way I ended up going about this was to
Sort the left/right outputs by record ID ascending to see true changes.
Thank you for all who replied and I apologize for confusion. I didn't know an easy way to ask this but not accidentally disclose code without company permission.