ALTERYX INSPIRE | Join us this May for for a multi-day virtual analytics + data science experience like no other! Register Now

Alteryx Connect Knowledge Base

Definitive answers from Connect experts.

When Connect Performance Drops Significantly and What to Do About It

KaterinaRutova
Alteryx
Alteryx
Created

When Connect Performance Drops Significantly and What to Do About It

 

Have you ever wondered what are the limits of Connect? How many entries can you store in Connect? Great questions!  In fact, no hard limit on the number of entries is set. The number around 2M entries is the number on which the performance tuning in the 2019.1 release was done.

 

However, there's a threshold beyond which Connect performance drops significantly and you're able to:

 

  • harvest and load (or refresh) into Connect (loading part) in a reasonable time.
    In this case, reasonable time means that the synchronisation or refresh won’t take significantly more than a person would expect, e.g. days instead of hours. 
    Note: The first initial load will take more time than every other synchronisation. This is because during this initial load all the entries are being created while during a synchronization only the changed assets are being refreshed. In such a case, don’t consider the first run as a baseline for the reasonable time assessment!
  • handle in Connect (platform part) in a reasonable response time.
    In this case, reasonable response time means that you are able to search, browse or display assets in a very short period of time. We are talking here in milliseconds or seconds maximum. 

 

Before You Start Performing Steps to Improve the Connect Performance

 

    • Consider the business value of storing so many entries in Data Catalog. Not all data assets (e.g. reports) in production need to be stored in the catalog. It’s probable there are some assets not determined to be searchable or able to be browsed.
    • It’s recommended to start the project with customer expectations on the use-cases you’re going to cover during implementations. Other things to be looked at are number of assets to be loaded into system, and feasibility as such.
    • Perform simulations for target infrastructure and if needed make the adjustments.

 

How to Increase the Threshold Beyond Which Performance Drops Significantly

 

It’s possible to push the threshold by doing the following: 

 

    • Adding more RAM: Add physical memory to the instance, then configure Connect service for using this memory. See Adjust Memory Settings in Connect Online help.
    • Adding more CPUs: Add physical CPUs to the instance. This will allow faster processing of more user sessions' threads in parallel.
      Note: Since we are licensing based on CPU cores, this may affect the Connect license.
    • Moving from HDD to SSD disks: I/O of SSD disk is much faster, and this reduces waiting for responses.
    • Porting to different database: This option is being tested at the moment.
Comments
jmelik
7 - Meteor

I'm happy to see that you're testing the ability to port to a different database. Are you able to share at this time what databases will potentially be supported? It would be a great benefit to us if we could port the database to one that is understood and managed by existing IT resources.

VojtechT
Alteryx
Alteryx

Hi @jmelik ,

 

the decision hasn't been made yet as we are currently doing POCs of the technologies.

I can tell we are considering:

  • MS SQL
  • MySQL
  • PostgreSQL

 

But let's be clear that porting to another database brings significant latency between the DB and Connect and helps only for really big instances where the embedded H2 is loosing the performance. The break even point is at this moment somewhere roughly around 10 millions of objects.