Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Comparing data sets

aparna0208
8 - Asteroid

I am trying to build a workflow to compare two data sets(first input is the excel file and other input is connected to a query in database) to see if the account in one data set exists in the other and return the matched and unmatched records. Any suggestions to do this better would be appreciated. I am totally new to Alteryx and in the process of learning things to do more efficiently. Thank you!

8 REPLIES 8
Aguisande
15 - Aurora
15 - Aurora

Hi @aparna0208 

I think a Join will give you what you're looking for.

Take an input data tool and read the excel file, connect it to the L input connector of the Join Tool.

Then grab another one and query the database and connect it to the R input connector of the Join.

Capture.PNG

Then try them to join by account number and you'll get all results in three ways:

- From the L output connector: All records from the L input dataset (the Excel file) that doen't exist on your DB Query

- From the R output connector: All records in the DB Query that doesn't exists in the Excel

- From the J output connector (The middle one): All existing in both datasources. 

Capture2.PNG

This Venn diagrams below the Join tool field config give you an idea of what you're going to get from the outputs (Left UNJOINED, joined and right UNJOINED)

 

Hope this helps.

 

 

aparna0208
8 - Asteroid

This will definitely help as this is exactly what I was looking for. Thank you so much @aguisande

Aguisande
15 - Aurora
15 - Aurora

Great! Anytime @aparna0208 !

ssabapa5
5 - Atom

Hi Aguisande,

 

Is there any other option to check the two values and trigger an error message or stop the flow, if it is "false"?

 

Thanks

cheerspankaj
7 - Meteor

Hi,

 

I have a version table which has one record for every change done for one or more column in the table at same time.

I am trying to use Alteryx to find what columns were changed in each version and the corresponding value.

Is it possible to do in Alteryx.

 

Example:

 

IDCOL1COL2VERSIONCREATION_DATEUPDATED_DATE
100AB110-Jan-2019 
100A1B2 11-Jan-2019
100A2B13 12-Jan-2019
      

 

In the above example, for ID:100, value changed for COL1 in version 2 and value changed for both COL1 and COL2 in version 3.

 

I want to show each version wise the attribute whose value changed and the value of same.

 

Thanks,

Pankaj

Aguisande
15 - Aurora
15 - Aurora

Hi @cheerspankaj 

What is awesome with Alteryx is that the answer to the question "Is it possible to do in Alteryx." is YES!

 

What I'd do is :

- First: Filter all UPDATED_DATE that are not Empty or Null (This should mean that that record had changed)

- Then: Use a Summarize Tool, and Group by: UPDATED_DATE, VERSION, COL1 (Assuming is the original Date) and get COL2 using Concatenate as the Aggragation method.

 

Hope this helps.

cheerspankaj
7 - Meteor

Hi, @Aguisande this is helpful.

I can group by specific column in a table and concatenate all values for each column for different records in the table for group by column, so I am able to see all attribute and its value in each version as per group by column.

 

Is there a way I can make it more dynamic where Alteryx can only shows those columns and its value which are changed between versions, see the example below.

MEMBER_IDMEMBER_NAMECREATION_DATEUPDATE_DATEVERSIONSTATECITYZIP
100MEMBER_X110-Jan-2019 1PACity112345
100MEMBER_X111-Jan-201911-Jan-20192PACity245678
101MEMBER_Y110-Mar-2019 1NJCity378645
101MEMBER_Y111-Mar-201911-Mar-20192NJCity489564

 

Now, the result expected should have like: ( only showing columns whose value has changed)

MEMBER_IDCREATION_DATEVERSIONCITYZIP
10010-Jan-2019|11-Jan-20191|2City1|City212345|45678
10110-Mar-2019|11-Mar-20191|2City3|City478645|89564

 

So In above table, we need to display only columns whose value has changed between two version of records ( user would like to see only attributes whose value changed between versions).

 

Anther possible output format could be: Only showing column whose value is changed.

 

MEMBER_IDColumnNameVersion1Version2
100Creation Date10-Jan-201911-Jan-2019
100CityCity1City2
100Zip1234545678
101Creation Date10-Mar-201911-Mar-2019
101CityCity1City2
101Zip7864589564

 

I think creating above output format may be complex, where columns will increase automatically depending on more and more versions of records created in the source table.

 

Let me know your thoughts, how we can create above output formats dynamically.

Aguisande
15 - Aurora
15 - Aurora

@cheerspankaj 

 

To get the first expected result, I'll suggest this:

 

- Filter the modified records (UPDATE_DATE is Not Null). Based on the data, this gives you ONLY those records that registered a modification.

- Group the records by MEMEBER_ID and concatenate the values (CREATION_DATE, VERSION, etc).

 

wf.jpg

 

- Once you have both streams, join them together by MEMBER_ID and deselect all Left incoming records

joinconfig.jpg

 

You'll end up with this:

result.jpg

 

The only shortcoming of this is that it can't evaluate that STATE remains unchanged. 

Labels