community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
Upgrade Alteryx Designer in 10 Steps

Debating whether or not to upgrade to the latest version of Alteryx Designer?

LEARN MORE

Matching and Joining data for similar values within fields

I have 2 data sets with similar data in the supplier field (common to both data sets).  I need the data to join based on the first few characters of that field.  I attempted fuzzy matching, but was not able to make it work.  Any suggestions?

 

Date set 1

Supplier          Sum of Sales

ABC                  10000

DEFG                   5000

 

Data set 2

Supplier           Sum of Ships

ABC Company Inc.       11000

DEFG Corp.                     5500

 

Desired result

Supplier           Sum of Sales         Sum of Ships

ABC                  10000                        11000

DEFG                 5000                            5500

Alteryx Certified Partner

Hi @JenniferLaVite 

 

Is the first Dataset composed by only one word? 

 

Cheers,

Not necessarily.  SOme company names contain 2 words, some contain only one.

Pulsar

Hi @JenniferLaVite 

 

This is probably to simplistic, but do you have a length that always matches?  For your sample data, extracting the first 3 characters from each data set and matching on that will work. 

 

Dan

Alteryx Certified Partner

Hey @JenniferLaVite 

 

I could present you two approaches (that depends much on your data to work)

 

workflow.PNG

 

First:

- Append company names to Supplier original dataset

- Use REGEX_MATCH function to identify an exact match in the Company Name dataset and return the Sum of Ships 

 

This can consume a lot of resources depending on the size of the dataset

 

Second:

- Use R tool with sapply function. This will take a single value and look at a whole field for matches (with grep function, which is regex equivalent)

 

dataset1 <- read.Alteryx("#1", mode = "data.frame")
dataset2 <- read.Alteryx("#2", mode = "data.frame")
dataset1$SumofShips <- sapply(dataset1$Supplier, function(x) max(dataset2[grep(x,dataset2$Supplier),2]))
write.Alteryx(dataset1, 1)
write.Alteryx(dataset2, 2)

 

 

This solution consumes less resources, but it can get tricky since it uses R code.

 

Workflow attached with both solutions on version 2019.1 of Alteryx.


Cheers,

Labels