Get Inspire insights from former attendees in our AMA discussion thread on Inspire Buzz. ACEs and other community members are on call all week to answer!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

AMP with Fuzzy Match Performance - Taking too long time for huge amount of data

urfriendumesh
7 - Meteor

Input Data-1: Text Input or Excel spreadsheet with list of client names. Each row has multiple clients separated by a delimiter (##). So, used Text To Columns tool to separate them as individual clients.

 

Input Data-2: SQL table with a list of records of which 4 columns have Client details in it. So, used Transpose tool to create separate rows for each client column. This table contains huge amount of data of around 1 million records (rows).

 

I have designed a workflow to perform Clients match of Input Data-1 with the Client Name of Input Data-2 using the Fuzzy match tool. The standard workflow using e1 Engine took around 5.5 hours of time to perform fuzzy match. Ran the same workflow using AMP Engine (4 threads) which took around 2-2.5 hours of time to return results. But, even this is huge amount of time and the business stakeholders are not happy with it. The below are the machine configuration in which this workflow was run.

 

Windows Server 2016

16 GB RAM with 4 Cores (Virtual processors)

Alteryx Version 2021.3.3.63061

 

Our requirement is to design the workflow using designer as an Analytic App and share it to the business users for them to run in their laptops. I have few questions around this as the App will be deployed in a server and the users will open this App from a Web UI.

 

  1. The default memory usage for AMP is 25% of available machine memory. Is this the memory usage of the server in which the Analytic App is deployed or the users machine in which they open it to run?
  2. Increasing the RAM and the number of cores will increase the performance of the AMP workflow. Is there any limit on the RAM and the number of cores/processors?
  3. If multiple users run this App/workflow at the same time from their respective machines, what will be the impact on the performance?
  4. As the Fuzzy match is performed for more than a Million records, it's taking a lot of time for completion. Any suggestions to improve the performance of the workflow?

 

Attached the screenshot of the workflow for reference. Due to data confidentiality, I'm unable to share the workflow and if required, I need to update the Input Data with some dummy data and share it.

 

Any help is much appreciated.

4 REPLIES 4
MarqueeCrew
20 - Arcturus
20 - Arcturus

@urfriendumesh ,

 

Looks like you've put together a proper write-up for Alteryx to review.  This should get to Product Management with the AMP Engine tag.  Here are a few ideas to help you:

 

  1.   For now use without AMP.
  2. Break workflow into pre-fuzzy match and post.
  3. Use JOIN tools (JOIN and/or Find Replace) to find exact duplicates.
  4. If you've got lots of rules in the fuzzy match (name, address, phone, email etcetera), you might want multiple match passes with different settings.  Only putting data into them where content is present (e.g. must have phone for a name + phone rule.

 

Cheers,

 

Mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
urfriendumesh
7 - Meteor

Thanks @MarqueeCrew for the response. Sorry for the late response as I was away for few days.

I have tried all the given suggestions, but still it's taking longer time to return results. 

Also, tagged this query to AMP Engine and Fuzzy Match, but not sure how to get this reached to the Product Management.

urfriendumesh
7 - Meteor

Can someone please help me on this request or assist me in reaching it to the Alteryx Product team.

TonyaS
Alteryx
Alteryx

@urfriendumesh 

I have seen this post, we have definitely made improvements in Fuzzy Match recently. 

Your questions seem to be related to running in Gallery or Server, so I will likely need to involve multiple people to get you some answers, but I will work on it.

 

In the meantime, if you can provide me a pared down version of your workflow with dummy data that reproduces the issue, along with specifics about the Server environment settings that you are running on, we can try to reproduce as well as identify whether it is improved on more recent versions. 

 

Tonya Smith
Sr. Technical Product Manager, cloud App Builder
Labels