Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Super slow fuzzy match workflow

nimeshkhatri
7 - Meteor

I am running a fuzzy match workflow to match data between two tables. Here are the details:

Table1: 2.8 million rows

Table2: 7000 rows

I only a have a single column to match and generate keys for fuzzy match. I have a 16GB RAM machine and I've set the join/sort memory to 8096MB but it still throws the low physical memory warning.

The source and target tables are both in Redshift and I am not using a bulk loader but I doubt that is the issue since as the snip attached shows, the processing is really slow at fuxxy match and following unique tool.

 

The process runs fine but I face super slow performance: fuzzy match completes 1% in 15 minutes.

 

I am attaching the workflow as well as sample files that I am using for the purpose.

Please share ways to improve the speed of this workflow.

2 REPLIES 2
fmvizcaino
17 - Castor
17 - Castor

Hi @nimeshkhatri ,

 

One thing that I would do, since you have a lot of identical data, is to summarize your client before entering the fuzzy match tool.

Another thing, I have noticed that you are generating keys for each word and leaving some behind.

 

It depends on your data, but I would uncheck this option and use a find/replace tool to remove those common words before. Since you have a recordID, you can get your original company name later.

 

Let me know if that help you.

Best,

Fernando Vizcaino

nimeshkhatri
7 - Meteor

That sounds obvious but it definitely didn't occur to me earlier. Thanks Fernando. The workflow does run much faster with about 7000 rows. Would be great if there are some guidelines around optimal use of fuzzy match

Labels