Input Data-1: Text Input or Excel spreadsheet with list of client names. Each row has multiple clients separated by a delimiter (##). So, used Text To Columns tool to separate them as individual clients.
Input Data-2: SQL table with a list of records of which 4 columns have Client details in it. So, used Transpose tool to create separate rows for each client column. This table contains huge amount of data of around 1 million records (rows).
I have designed a workflow to perform Clients match of Input Data-1 with the Client Name of Input Data-2 using the Fuzzy match tool. The standard workflow using e1 Engine took around 5.5 hours of time to perform fuzzy match. Ran the same workflow using AMP Engine (4 threads) which took around 2-2.5 hours of time to return results. But, even this is huge amount of time and the business stakeholders are not happy with it. The below are the machine configuration in which this workflow was run.
Windows Server 2016
16 GB RAM with 4 Cores (Virtual processors)
Alteryx Version 2021.3.3.63061
Our requirement is to design the workflow using designer as an Analytic App and share it to the business users for them to run in their laptops. I have few questions around this as the App will be deployed in a server and the users will open this App from a Web UI.
- The default memory usage for AMP is 25% of available machine memory. Is this the memory usage of the server in which the Analytic App is deployed or the users machine in which they open it to run?
- Increasing the RAM and the number of cores will increase the performance of the AMP workflow. Is there any limit on the RAM and the number of cores/processors?
- If multiple users run this App/workflow at the same time from their respective machines, what will be the impact on the performance?
- As the Fuzzy match is performed for more than a Million records, it's taking a lot of time for completion. Any suggestions to improve the performance of the workflow?
Attached the screenshot of the workflow for reference. Due to data confidentiality, I'm unable to share the workflow and if required, I need to update the Input Data with some dummy data and share it.
Any help is much appreciated.