I am currently using the join tool to do my vlookup but I am quickly finding out it may not exactly be a proxy for vlookup like I thought. For example, I have a data set that has 2 columns - unique ID and Date. I need to bring date in by vlookup to my transaction table. My transaction table has multiple records for the same unique ID, however, this report is ran for one day at a time, so the date column for all records on my transaction table is the same.
When I do my join, its adding every transaction for each unique ID instead of just populating the date, in my configuration, I have de-selected all of my columns from the transaction table except the Date column. Is this a system limitation of Alteryx or am I configuring something incorrectly? Is there a more effective way to do an actual vlookup? Please see example below.
Data Set: Transaction Table: After Joining:
Unique ID Date Unique ID Date Trans Type Unique ID Date
1 Blank 1 2/20/24 3 1 02/20/24
2 Blank 2 2/20/24 3 2 02/20/24
3 Blank 3 2/20/24 3 3 02/20/24
4 Blank 4 2/20/24 3 3 02/20/24
5 Blank 5 2/20/24 3 4 02/20/24
5 2/20/24 3 5 02/20/24
3 2/20/24 3 5 02/20/24
I am currently just adding a unique tool after the join to remove the duplicates, however, I'd prefer to not have to do this. Is this really the best way to do a vlookup?
Thank you,
Put a summarize tool after the join and group by all columns, it is equivalent to SELECT DISTINCT
So to clear, there is no way to do an exact vlookup- I am always going to need to do unique tool or summarize after? Seems pretty inefficient?
You can use a Find Replace tool to do more similar to a VLOOKUP - the main difference is that the VLOOKUP will get the first match (top down) whereas the Find Replace will find the last match (bottom up).
As others have pointed out, the solution is to either use a Unique tool or a Summarize on your Transaction data before joining. This may seem ineffecient however this is the price you have to pay when you have duplicates in your data. The primary difference between Join and vlookup is that Join will include duplicate rows while Vlookup just takes the first result when searching from the top. This works for your data set because the duplicate values of 3 and 5 have the same date, however in instances when these are different, this can cause issues. By changing the summarize tool to do max/min, or sorting before using a unique tool, you can actually have control over what Alteryx is joining, rather than vlookup where it ends up being sort of random which value it brings in.
Some resources to help: VLookUps with Designer - Alteryx Community and Alteryx for Excel Users: How to do a VLOOKUP in Alteryx
@CDIns you have two efficiencies to consider, time to develop and time to execute. Alteryx definitely optimizes development time. If you want to optimize execute time then write in C instead of Alteryx. That being said, I've found Alteryx to be blazing fast to run and exponentially faster than developing in C.
Long ago we had to also optimize memory, but no one worries about that anymore. Writing in C might reduce the operation from 1 second to 0.1 second. I don't worry about that anymore unless dealing with workflows that run over 30 minutes.
Welcome to Alteryx atom where solving is much more thrilling than tuning. :)
Hi @hroderick-thr , I support @alexnajm 's approach as the Find Replace Tool offers ability to 'append'. Should be called Find Replace/Append Tool
RE: this statement: So to clear, there is no way to do an exact vlookup- I am always going to need to do unique tool or summarize after? Seems pretty inefficient?
A vlookup is an an Excel specific thing which is a choose the first entry/join. that's it. It is not a native SQL funtion, or a programming language standard structure. There is nothing efficient or ineffiecient about trying to recreate it in Alteryx. You can get the exact same results as your vlookup, but how you go about it may vary depending upon your data. I am old. I am old enough to remember when Excel was used as a spreadsheet - and this is where the concept of a VLOOKUP came from. This was never intended as a large scale data matching function.
Also re:
Is this a system limitation of Alteryx or am I configuring something incorrectly?
this is how Joins work. Join A -> Join B creates a record for ever matched entry. If you are using non-unique fields in both data sources you will receive a match fo A*B records - since in the real world (aka SQL) - we cannot determine which record is the correct entry to match with - so we match with all entries. You avoid this by making sure one of your data streams in unique. I use the summarize tool in group by mode. This is correct behavior - and is invaluable at identifying situations where datasources should be unique but are not - and upstream data issues.
@HomesickSurfer but but but Find/Replace is not on my Favorites tab! Just kidding...It may handle unknown columns better than Summarize. Summarize is quicker to configure. Both work and will be long forgotten after today 😉