This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
I am running a fuzzy match workflow to match data between two tables. Here are the details:
Table1: 2.8 million rows
Table2: 7000 rows
I only a have a single column to match and generate keys for fuzzy match. I have a 16GB RAM machine and I've set the join/sort memory to 8096MB but it still throws the low physical memory warning.
The source and target tables are both in Redshift and I am not using a bulk loader but I doubt that is the issue since as the snip attached shows, the processing is really slow at fuxxy match and following unique tool.
The process runs fine but I face super slow performance: fuzzy match completes 1% in 15 minutes.
I am attaching the workflow as well as sample files that I am using for the purpose.
Please share ways to improve the speed of this workflow.
That sounds obvious but it definitely didn't occur to me earlier. Thanks Fernando. The workflow does run much faster with about 7000 rows. Would be great if there are some guidelines around optimal use of fuzzy match