Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Is there a more effective way to do a Vlookup?

CDIns
8 - Asteroid

I am currently using the join tool to do my vlookup but I am quickly finding out it may not exactly be a proxy for vlookup like I thought. For example, I have a data set that has 2 columns - unique ID and Date. I need to bring date in by vlookup to my transaction table. My transaction table has multiple records for the same unique ID, however, this report is ran for one day at a time, so the date column for all records on my transaction table is the same. 

 

When I do my join, its adding every transaction for each unique ID instead of just populating the date, in my configuration, I have de-selected all of my columns from the transaction table except the Date column. Is this a system limitation of Alteryx or am I configuring something incorrectly? Is there a more effective way to do an actual vlookup?  Please see example below. 

 

Data Set:                                                           Transaction Table:                                                                       After Joining:

 

Unique ID        Date                                    Unique ID         Date           Trans Type                                                             Unique ID         Date  

1                            Blank                                           1                      2/20/24            3                                                                                   1                 02/20/24

2                            Blank                                           2                      2/20/24             3                                                                                  2                 02/20/24

3                            Blank                                           3                      2/20/24             3                                                                                  3                  02/20/24

4                            Blank                                           4                     2/20/24              3                                                                                  3                  02/20/24

5                            Blank                                           5                    2/20/24              3                                                                                   4                  02/20/24

                                                                                       5                    2/20/24               3                                                                                 5                   02/20/24

                                                                                       3                   2/20/24              3                                                                                   5                  02/20/24

                                             

 

I am currently just adding a unique tool after the join to remove the duplicates, however, I'd prefer to not have to do this. Is this really the best way to do a vlookup? 

 

Thank you, 

9 REPLIES 9
hroderick-thr
11 - Bolide

Put a summarize tool after the join and group by all columns, it is equivalent to SELECT DISTINCT

CDIns
8 - Asteroid

So to clear, there is no way to do an exact vlookup- I am always going to need to do unique tool or summarize after? Seems pretty inefficient? 

alexnajm
17 - Castor
17 - Castor

You can use a Find Replace tool to do more similar to a VLOOKUP - the main difference is that the VLOOKUP will get the first match (top down) whereas the Find Replace will find the last match (bottom up).

TheMattLeonard
8 - Asteroid

As others have pointed out, the solution is to either use a Unique tool or a Summarize on your Transaction data before joining. This may seem ineffecient however this is the price you have to pay when you have duplicates in your data. The primary difference between Join and vlookup is that Join will include duplicate rows while Vlookup just takes the first result when searching from the top. This works for your data set because the duplicate values of 3 and 5 have the same date, however in instances when these are different, this can cause issues. By changing the summarize tool to do max/min, or sorting before using a unique tool, you can actually have control over what Alteryx is joining, rather than vlookup where it ends up being sort of random which value it brings in.

alexnajm
17 - Castor
17 - Castor
hroderick-thr
11 - Bolide

@CDIns you have two efficiencies to consider, time to develop and time to execute. Alteryx definitely optimizes development time. If you want to optimize execute time then write in C instead of Alteryx. That being said, I've found Alteryx to be blazing fast to run and exponentially faster than developing in C.  

 

Long ago we had to also optimize memory, but no one worries about that anymore. Writing in C might reduce the operation from 1 second to 0.1 second. I don't worry about that anymore unless dealing with workflows that run over 30 minutes.

 

Welcome to Alteryx atom where solving is much more thrilling than tuning.  :)

HomesickSurfer
12 - Quasar

Hi @hroderick-thr , I support @alexnajm 's approach as the Find Replace Tool offers ability to 'append'.  Should be called Find Replace/Append Tool

apathetichell
19 - Altair

RE: this statement: So to clear, there is no way to do an exact vlookup- I am always going to need to do unique tool or summarize after? Seems pretty inefficient? 

 

A vlookup is an an Excel specific thing which is a choose the first entry/join. that's it. It is not a native SQL funtion, or a programming language standard structure. There is nothing efficient or ineffiecient about trying to recreate it in Alteryx. You can get the exact same results as your vlookup, but how you go about it may vary depending upon your data. I am old. I am old enough to remember when Excel was used as a spreadsheet - and this is where the concept of a VLOOKUP came from. This was never intended as a large scale data matching function.

 

Also re:

Is this a system limitation of Alteryx or am I configuring something incorrectly? 

 

this is how Joins work. Join A -> Join B creates a record for ever matched entry. If you are using non-unique fields in both data sources you will receive a match fo A*B records - since in the real world (aka SQL) - we cannot determine which record is the correct entry to match with - so we match with all entries. You avoid this by making sure one of your data streams in unique. I use the summarize tool in group by mode. This is correct behavior - and is invaluable at identifying situations where datasources should be unique but are not - and upstream data issues.

hroderick-thr
11 - Bolide

@HomesickSurfer  but but but Find/Replace is not on my Favorites tab!  Just kidding...It may handle unknown columns better than Summarize. Summarize is quicker to configure. Both work and will be long forgotten after today 😉

Labels