community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
#SANTALYTICS

The highly anticipated Alteryx Community tradition is back! We hope you'll join us!

Learn More

Map a huge dataset using a mixture of Join, Find and Replace and Fuzzy Matching....??

Asteroid

Hi,

 

 I have a large dataset of over 10m rows.  Within my source data  I have Product ID, Product Description and Brand.  I also have a 'clean' Product ID look up table to map to.

 

The issue I have is that often my source Product ID is missing (could be part of the Product Description), there is other text in the Product ID, the Product ID could be part of the description, etc.  Also, I am only looking to map a subset of the Product Data (eg  particular item(s) which may only have 200 unique lookup values) BUT these may be similar to other product types in which case fuzzy matching may throw up some unwanted matches.

 

So..... can anyone suggest a suitable hierarchy/workflow for this ?/  

 

I've thought of a possible workflow :

 

  1. Match directly on Product ID in the first instance.
  2. For these matches - get the unique Brand and use this to create a subset of the records that didn't match in Step 1 (this reduces the likelihood of trying to match on Product ID's that are similar to the Product ID Lookup)
  3. Take this subset and carry out a Find & Replace on the Product ID.
  4. Would a Fuzzy Match on the Product Code be the next step?  As I could have 100s of 1000s of records could this be time consuming and crash the PC/Server?? 

 

Any thoughts / advice very much appreciated? 

 

Regards,

 

Fiorano

Alteryx
Alteryx

Hi Fiorano,

 

Can you please provide sample data and any progress you've made already in a workflow?

 

Best,

 

John Posada

 

Best,
John Posada
Cloud Engineer
Labels