Showing results for 
Search instead for 
Did you mean: 

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.

Matching and Joining data for similar values within fields

I have 2 data sets with similar data in the supplier field (common to both data sets).  I need the data to join based on the first few characters of that field.  I attempted fuzzy matching, but was not able to make it work.  Any suggestions?


Date set 1

Supplier          Sum of Sales

ABC                  10000

DEFG                   5000


Data set 2

Supplier           Sum of Ships

ABC Company Inc.       11000

DEFG Corp.                     5500


Desired result

Supplier           Sum of Sales         Sum of Ships

ABC                  10000                        11000

DEFG                 5000                            5500

Alteryx Certified Partner
Alteryx Certified Partner

Hi @JenniferLaVite 


Is the first Dataset composed by only one word? 



Not necessarily.  SOme company names contain 2 words, some contain only one.


Hi @JenniferLaVite 


This is probably to simplistic, but do you have a length that always matches?  For your sample data, extracting the first 3 characters from each data set and matching on that will work. 



Alteryx Certified Partner
Alteryx Certified Partner

Hey @JenniferLaVite 


I could present you two approaches (that depends much on your data to work)





- Append company names to Supplier original dataset

- Use REGEX_MATCH function to identify an exact match in the Company Name dataset and return the Sum of Ships 


This can consume a lot of resources depending on the size of the dataset



- Use R tool with sapply function. This will take a single value and look at a whole field for matches (with grep function, which is regex equivalent)


dataset1 <- read.Alteryx("#1", mode = "data.frame")
dataset2 <- read.Alteryx("#2", mode = "data.frame")
dataset1$SumofShips <- sapply(dataset1$Supplier, function(x) max(dataset2[grep(x,dataset2$Supplier),2]))
write.Alteryx(dataset1, 1)
write.Alteryx(dataset2, 2)



This solution consumes less resources, but it can get tricky since it uses R code.


Workflow attached with both solutions on version 2019.1 of Alteryx.