Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Comparing Differences in .TXT files

caltang
17 - Castor
17 - Castor

Hi! 

 

I have about 55 txt files that exists as pairs. For example, File A1 is a pair of File A2, likewise with File B1 and File B2. 

 

My goal: Compare the differences between columns & rows, and spread them out like:

Column A1 Row A1 Value A1 Column A2 Row A2 Value A2

 

For comparison - the anchor is File _1. Hence, A1 is the anchor of A2, and B1 the anchor of B2. 

 

This is a simple application of it. But there are some more rules:

  1. There can be scenarios where the first row is not the header due to a myriad of issues. The headers are fixed for each file, some will have more or less.
  2. The order of the columns can be different.
  3. The order of the rows can also be different.

 

The order is important, so the order of the columns must be standardized first, then the rows follow next.

 

At the end of it, I need to output to an Excel (.xslx) with multiple sheets. Each sheet corresponding to the name of the file. E.G: A, B, etc.

 

Thanks!

Calvin Tang
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/
5 REPLIES 5
CoG
14 - Magnetar

For verification:

What do you mean by "the first row is not the header"?

Also, Is there a key column for each file? Otherwise, how can you compare two tables if row order is uncertain?

 

Finally, is there a standard naming convention for the files that indicates which files are paired and which indicates which file is the anchor for the pairing? As of right now I am assuming that the 'A' designates the pair and '1', the anchor, but I'm not sure if you added this as a simplification to your sample data.

caltang
17 - Castor
17 - Castor

Hi @CoG !

 

Let me break it down...

 

  1. My wording was off on the header bit - there are some datasets where the first row is not the header - but a random value due to a system glitch that can't be fixed by end users. So in some cases, we have headers in the 2nd or 3rd rows - but they always start with "!*" as the first. So I suppose StartsWith([Field1],"!*") is good enough to indicate the header. 
  2. That is the question I am trying to tackle myself. There can be times where the row order shifts between two files. For instance, in File C1, Row 13 is equivalent to Row 15 in File C2. Now, I've tried joining by direct values - but this poses a "what-if" challenge - this is assuming both values are the same. It can also cause a many-to-many relationship if I join them. It also breaks down when the value changes slightly between them - so I'm stuck in this case. I was thinking of just letting the user check on their own because I simply cannot think of a mechanism to handle these exceptions myself.
  3. In terms of standard, yes - it is as sampled. A, B, C, D, E, etc. then AA, AB, AC - just like Excel column headers. 

So in terms of progress, I've managed to get the Column Orders sorted to match both sides, but Row Order is a complete mess. Any ideas?

Calvin Tang
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/
CoG
14 - Magnetar

If the data isn't too large, you could opt for the many-to-many Join, Formula some sort of ranking metric (equality is the simplest and what I used in the sample), and then iteratively select the best association. This is what I was thinking (I assumed that you can plug in the solution to the header issue). Very tricky problem!

 

Main:

_Main.png

 

Batch Macro:

_Batch.png

 

Iterative Macro:

_Iterative.png

 

caltang
17 - Castor
17 - Castor

I opted for a similar approach! But your macro is much better. Thanks @CoG !

Calvin Tang
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/
KGT
13 - Pulsar

I use this method all the time. You can choose any descriptor/s in the Transpose. I've added a RecordID and used that.

 

If you need to find the headers, then I use a Multi-row to identify the header row and fill from there, then a filter based on that tag followed by Dynamic Rename. After that, transpose.

 

The first join is on all fields, with the second being on all except [Value].

 

This could easily be a macro, and I don't think it differs much from the CReW Expect Equal Macro.

 

 
 

AlteryxGui_cWEwY8bcsn.png

Labels
Top Solution Authors