I am running a fuzzy match workflow to match data between two tables. Here are the details:
Table1: 2.8 million rows
Table2: 7000 rows
I only a have a single column to match and generate keys for fuzzy match. I have a 16GB RAM machine and I've set the join/sort memory to 8096MB but it still throws the low physical memory warning.
The source and target tables are both in Redshift and I am not using a bulk loader but I doubt that is the issue since as the snip attached shows, the processing is really slow at fuxxy match and following unique tool.
The process runs fine but I face super slow performance: fuzzy match completes 1% in 15 minutes.
I am attaching the workflow as well as sample files that I am using for the purpose.
Please share ways to improve the speed of this workflow.