community
cancel
Showing results for 
Search instead for 
Did you mean: 

Engine Works Blog

Under the hood of Alteryx: tips, tricks and how-to's.
Alteryx
Alteryx

I remember the first serious reason I used Excel. It was autumn 2005, I was a college freshman, and had major aspirations: becoming a professional poker player. That was around the peak of poker. Chris Moneymaker and Greg Raymer won in 2003 and 2004 respectively. The first place prize for 2006 World Series of Poker Main Event was $12,000,000. I was engrossed in it playing with friends, family, and strangers both online and in-person.

 

As a video gamer, I’m used to having my “stats” tracked so I wanted the same with poker. Excel was my first choice after seeing a friend create his own. It was designed to track session dates, times, and monetary returns while blending online and offline results. It was nothing spectacular since all it had were basic functions and a pretty format.

 

The spreadsheet revealed a few important findings. My monetary return over time was terrible. I couldn’t control the variance and wasn’t mature enough to handle it. Essentially, I wasn’t very good. Nothing hurts more than the shattered dreams of becoming a professional poker player (totally not hyperbolic).

 

poker.gifMFW I realize I'm not good at poker.

MFW I realize I’m not good at poker.

 

It was time to move on and evolve my dreams. I used to love sports (I still do, but used to too), so I set my goal to join the front office of the Denver Nuggets or Broncos. Neither gave me the time of day, so I moved on to the next best thing: fantasy basketball and football.

 

I painstakingly copied and pasted NBA and NFL data into spreadsheets to create my own draft rankings. Then used them to manage my rosters with great results. At the time, the overhead was too much so I scraped them and decided to use my amazing and flawless judgement to manage fantasy teams going forward (I’ve won about two fantasy leagues since).

 

Semesters into my collegiate career, my finance and accounting professors began assigning projects and homework that required Excel. Saying I was excited is an understatement.

 

Why does this all matter? For better or worse, Excel is the number one BI tool and may stay that way for a long time. According to Business Insider, Microsoft CEO Satya Nadella says their best consumer product is Excel. For many including myself, Excel is their first experience with data preparation, blending, and analytics. I was a business student learning finance and accounting where spreadsheets are the backbone of those industries. So how did Excel lay this foundation for me? It starts with two functions: VLOOKUP and IF.

 

The IF function is outside the scope of this post, so we’ll explore the VLOOKUP function. Let’s do a quick refresher. The function finds a specified value in a table and returns a value from a specified column.

 

Here’s the rundown of a VLOOKUP example below:

  • The VLOOKUP arguments box shows for cell H3, which contains =VLOOKUP(G3,A:E,4,FALSE). (green box)
  • The Lookup_value is G3, which is the name Lee. (red boxes)
  • The Table_array is columns A through E. (blue boxes)
  • The Col_index_num is 4, which is the Sales column. (orange boxes)
  • The argument FALSE means the match must be exact to return the desired value.
  • The function will look for Lee in the leftmost column of the table array, which is First Name. If Lee is found, then Sales is returned. Since Lee does exist in Table 1, the function returns the Sales value $170,000. (purple boxes)

 

vlookup_example.png

 

As other articles have stated, Alteryx can duplicate VLOOKUP with the Join, Join Multiple, and Find Replace tools. Let’s look at how to duplicate this simple VLOOKUP with a Find Replace tool. As shown below, both tables are loaded with Input Data tools and connect to a Find Replace tool.

 

 findreplace_input.png

 

The tables as viewed in Alteryx:Table1_Table2.png

 

The Find Replace tool does the following:

  1. Looks for Name from Table 2 in the field First Name from Table 1.
  2. If the Name is found in First Name, then Sales is returned.

 

The output of the Find Replace tool has Sales appended to each name:

 findreplace_output.png

 

Let’s review the configuration of the Find Replace tool.

 

 findreplace_config.png

 The Find  section:

 

The tool takes each record from the Name field of Table 1 and looks for it in the First Name field of Table 2.

 

 

The Replace  section:

 

If found, append Sales record to the Name.

 

 

But wait! There is something wrong with the data that affects both the VLOOKUP and Find Replace. Table 2 will look for Angelina, but there are two Angelinas in Table 1. The VLOOKUP function returns $130,000, which is the first Sales for both Angelinas. The Find Replace tool returns the second Sales number of $150,000 for Angelina. Neither is wrong, but they don’t tell the whole story. To get all the data, Last Name will be included. Table 4 is an updated version of Table 2 with both Angelinas. We will perform a lookup on First Name and Last Name.

 

table4.png

 

So how do we perform a VLOOKUP on multiple columns? And what is the Alteryx equivalent?

 

A new column for each table is created by concatenating the First Name and Last Name fields. Either the CONCATENATE function or the “&” symbol may be used. The VLOOKUP function and Find Replace tool must point to Concatenated Name as the field to lookup and find.

 

Table 3 shows an example of the new Concatenated Name field in cell A3.

Table 6 is an updated version of Table 4 containing the Concatenated Name field. It also shows the updated VLOOKUP function in cell K3.

Table3_Table6.png

 

While this can be used for Find Replace, I don’t recommend it. It’s not scalable when additional lookup columns are needed later. Concatenating so many fields is time consuming and can lead to many mistakes. Instead, let’s use Find Replace’s cousin the Join tool.

 

The Alteryx workflow looks nearly identical as above.

join_input_wf.png

 

Table 1 is identical as before and will be joined to Table 4. Table 6 is not used since the Join tool doesn’t need a concatenated field to merge datasets.

Table1_Table4.png

 

The Join tool will blend Table 1 and 4 together based on the selected fields. Here is the configuration of the Join tool:

 

 join_config.png

 

The section Join by Specific Fields merges the two tables together based on First Name and Last Name. This does exactly what we want without having to create an additional field.

 

Joining by multiple fields in the Join tool is exactly like using a VLOOKUP function with multiple columns.

 

The Join tool is easier to scale and manage as dashboard and data requirements evolve.

 

 

Each sales amount outputs correctly with each name. Here’s the output of the Join tool:

 

join_output.png

 

Let’s do a quick recap of what we just did with VLOOKUPs, the Find Replace tool, and the Join tool:

  • Looked up Name from Table 2 to find in Table 1 to return Sales for each person.
  • Showed how the lookup works in the VLOOKUP function and the Find Replace tool.
  • Create a concatenated field using First Name and Last Name. This shows how to use multiple fields for the VLOOKUP function. This can be applied to the Find Replace tool as well.
  • Showed how to configure the Join tool to merge data on multiple fields. In this context, it is equivalent to the VLOOKUP field with multiple criteria.

 

The Alteryx Join tools are a natural progression from Excel’s VLOOKUP. As great as the VLOOKUP function is, its limitations keep it from easy scalability and automation. If you’re familiar with combining the INDEX and MATCH functions, then you might’ve realized that the Join tools cover this as well.

 

While the VLOOKUP function played a major role in shaping my Alteryx foundation, it didn’t do so alone. Other concepts include Excel’s IF function and SQL database concepts such as groups, joins, unions, and dimension and fact tables. I won’t touch them in this post, but may do so in the future. I’m 50/50 on starting my 99-part blog series discussing Excel and Alteryx, so be on the lookout!

 

What helped build your foundation for using Alteryx? Everyone has a different path to enlightenment and I would love to hear yours! Smiley HappyWoman HappyMan Happy

Rithi Son
Product Engineer

Rithi is a Product Engineer that started at Alteryx in March 2016. He has worked as a reporting and data quality analyst in ecommerce and health care utilizing Excel and SQL. Rithi lives in Denver enjoying life in the Colorado front range.

Rithi is a Product Engineer that started at Alteryx in March 2016. He has worked as a reporting and data quality analyst in ecommerce and health care utilizing Excel and SQL. Rithi lives in Denver enjoying life in the Colorado front range.

Comments
Alteryx
Alteryx

Nice post Rithi. Remind me not to play Poker with you!

Atom
Great post to triger Alteryx journey! One point needs attention: if use Join, really need to notice that if there is N-M match, Alteryx will creat duplicate lines (total lines will become N*M from N). For example, you have 2 lines with input "A" and a two lines mapping "A-B; A- C", now after join, you will get AB AC AB AC four lines in total. It will break the original shape of your data...Anyone has ideal how to deal with this problem?
Bolide

Great post @RithiS! Excel was my bread and butter before Alteryx. I learned VBA and was able to code some wicked tools, processes, and reports in Excel. Then I transitioned from college student to full time analyst and was handed a big-boy data set. Access was too weak to handle it. SAS took forever to code, edit, and run. SQL got complicated quickly. Thankfully my company started me on Alteryx right away and I've never looked back. I still find Excel to be a powerful end product of my Alteryx data, but now I've transitioned to using Alteryx more than Excel (and enjoying Alteryx more than Excel)...a day I never thought would come.

 

@Isa - To answer your question...you've discovered what some call a Cartesian join. Here is another post by @JohnJPS on them. It sounds like you have 2 records of "A" on both the L and R inputs (2*2 = 4 resulting rows). To prevent it, you'll want to make your join more precise (join on more fields), or Unique/Summarize your data before the join to remove the second "A" on the right side. In other words, get rid of duplication on the join field(s). Hope this helps!

Alteryx
Alteryx

Thank you for the kind comments!

 

@Isa - the info provided by @DultonM is great. If you have other questions, I recommend posting them in one of the Discussions forums like Data Preparation & Blending

Great tip for those of us just coming over from Excel.  Nice Mitch Hedberg reference too Smiley Very Happy

Labels