community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
#SANTALYTICS

The highly anticipated Alteryx Community tradition is back! We hope you'll join us!

Learn More
SOLVED

Super slow fuzzy match workflow

Highlighted

I am running a fuzzy match workflow to match data between two tables. Here are the details:

Table1: 2.8 million rows

Table2: 7000 rows

I only a have a single column to match and generate keys for fuzzy match. I have a 16GB RAM machine and I've set the join/sort memory to 8096MB but it still throws the low physical memory warning.

The source and target tables are both in Redshift and I am not using a bulk loader but I doubt that is the issue since as the snip attached shows, the processing is really slow at fuxxy match and following unique tool.

 

The process runs fine but I face super slow performance: fuzzy match completes 1% in 15 minutes.

 

I am attaching the workflow as well as sample files that I am using for the purpose.

Please share ways to improve the speed of this workflow.

Alteryx Certified Partner

Hi @nimeshkhatri ,

 

One thing that I would do, since you have a lot of identical data, is to summarize your client before entering the fuzzy match tool.

Another thing, I have noticed that you are generating keys for each word and leaving some behind.

 

It depends on your data, but I would uncheck this option and use a find/replace tool to remove those common words before. Since you have a recordID, you can get your original company name later.

 

Let me know if that help you.

Best,

Fernando Vizcaino

That sounds obvious but it definitely didn't occur to me earlier. Thanks Fernando. The workflow does run much faster with about 7000 rows. Would be great if there are some guidelines around optimal use of fuzzy match

Labels