Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Comparing 2 Spreadsheets for Differences

jmmart08
8 - Asteroid

Apologies as I'm sure this question has been asked so many times.  I am trying to build a business case for the need to get Alteryx, and I am just looking for some advice on which tool(s) I would need to use to compare 2 spreadsheets to highlight any differences in any cell in both data sets.  I have figured out how to sort the 2 different inputs so the columns are lined up.  I'm just looking for a way to now compare the 2 sheets for any differences in any cell.  I have attempted using the Join tool but not sure if that is producing the result I want.

 

Any help would be greatly appreciated.

 

Regards

10 REPLIES 10
JosephSerpis
17 - Castor
17 - Castor

Hi @jmmart08 I mocked up a workflow that demonstrates the approach I've taken to this task before. Essentially you want to use the Transpose tool to pivot your data so it's easier to use the Join tool and compare against all the values of your spreadsheet. Let me know what you think?

jmmart08
8 - Asteroid

Hi JS420,

 

Thank you VERY much!  I really appreciate this.  I am so new at this that I don't really understand what the Record ID piece is doing, and also in the Join tool where at the top you have Name on the Left, Name on the Right, and RecordID on the Left and RecordID on the right.  For example, my column headers are TD, Broker, UNITS, etc.  Would I place all of those fields as the left and right entries?  I also clearly don't understand the bottom portion in the join tool either.  I do not know which of the selections to check or uncheck.  Any explanation you may be able to give me would be extremely helpful and extremely appreciated.

 

Thanks again!

JosephSerpis
17 - Castor
17 - Castor

Hi @jmmart08 the Record ID are creating a rank field the transpose tool will take the column headers and pivot them into one field called Name. All of the data within the columns will be pivoted into one field called Value. The J output of the join tool returns all the values that joined. The L output would return any values not joined from spreadsheet 1 and the same for R output with the J output. In my example everything joins so the J output is the only output I need to look out.

 

Because I pivot the data it saves me from having to try to match to every column in a report which could be a handful to a significant amount plus makes the solution dynamic.

AndrewS
11 - Bolide

Hi @jmmart08 

 

You could also try the Crew macro 'Expect Equal' It will tell you if columns have different names/are missing and if cell data is not in sync.

 

expectequal.PNG

fharper
12 - Quasar

@jmmart08 Since you are new I want to point out that when someone provides you a solution you are satisfied with you have the ability, since it is you post, to mark that response as the or A solution....sometimes there are more than one viable solution.  This is important so the people looking at the problem know so they either do not continue to contribute or if they were following to get a solution know what works.  helps other months from now find the solution also.

 

@JosephSerpis did what I would have done as well so I am confident that will be a solution.

@AndrewS had a solution as well using a CReW macro...there is often more than one solution...

 

That said that task is a drop in the ocean in respect to all that Alteryx can and will do for your organization.  When I was first introduced to Alteryx it was just 2 of us, 2 licensees, and in a few months we did so much we became the go to people to get rapid and accurate reporting and analysis on many things within our department...within a year other departments started acquiring licenses because they could do more and faster with same people.

 

Many orgs benefit because it provides a measure of independence from IT because the business experts with a little software savvy can solve a lot of issues themselves.  for people who are developers they can do many things faster so ad-hoc and analytics is much more efficient/effective compared with typical tools best suited for conventional development.

 

Just get smart fast on implementing a bit of SDLC, look it up.  You will blink and before you know you have hundreds of workflows with 10 or 15 people touching them and some becoming "productionized" and then there are macros for reusable code....you don't want bunches of different versions of flows and macros floating around on local drives plus shared drives.  Discipline and a strategy is key.

 

Enjoy the ride though...Alteryx is a great tool.

jmmart08
8 - Asteroid

Thank you very much for this information!  I have not been able to work in Alteryx for nearly 4 months but am now going to be in it probably daily.  This worked extremely well. 

 

What I am stuck on now is finding a tool that will let me change actual cell data.  For Example, 1 of the systems I get input date from uses a label of BARX while the other system I get input data from uses a label of BRS.  I would like to edit the input sheet that uses BARX and change it to BRS in every single instance.  I have multiple label differences, so with the comparison workflow I've built I always have a difference because of the labels, but the data is the exact same.  To make it easier to read and only spot actual differences in the 2 datasets, I would like to change the labels with a tool if that is possible.

 

Any suggestion would be much appreciated.  Thank you!

fharper
12 - Quasar

My current role does not provide me the opportunity to work in Alteryx so this is off memory but I am sure someone else will chime in with more detail.  

First, if I understand you correctly you have columns in different sources that are the "same" but which have different column names (labels).  Look at the Dynamic Rename tool and ways it can be configured.  I have used it to provide a convenient way to rename fields in one source to standardized field names kept in a matrix file, old column name vs new column name.  it is pretty straight forward assuming I am correct in what I think you want.

AndrewS
11 - Bolide

Hi @jmmart08 

 

If you bring in your data Input and uncheck 'First Row Contains Field Names' you could then use a FindReplace followed by a dynamic rename. I've attached an example, let me know if this helps.

jmmart08
8 - Asteroid

Hi AndrewS

 

Thank you very much for this.  Very helpful!  If I may just ask a few more questions regarding this AMAZING Workflow that you have provided me.  It works great for static data, meaning as long as no new information is introduced to either of the systems that I export out of I have no issues.

 

What I am trying to figure out how to do is identify solely those data points between the 2 outputs that are different.  What happens is if 1 system introduces a new element (For example 1 of my outputs introduces a new account number and I haven't accounted for this is my developed workflow) then everything below this new row now shows up as a difference because the rows aren't 1 to 1.  I am trying to build a process where only the newly introduced variable shows on my True side of the output.  There are several differences when this happens because again this new element throws off the 1 to 1 in this workflow and there aren't true differences in them;  only the newly introduced variable that I have not accounted for is truly different. This is something I can easily fix as the developer but the users that will work off this workflow will not have access and are stuck until I add this information.

 

To try to summarize, is there a way that I can account solely for just the difference between 2 completely different outputs without having to parse the data down where rows are 1 to 1?  Again, this "throws" off other rows so now it is not 1 to 1. 

 

Any suggestions would be much appreciated!  If you need a workflow let me know.  Apologies as I would have to mock up some information and that would affect some parameters I have built out such as filters, sorting, etc.

Labels