Free Trial

General Discussions

Discuss any topics that are not product-specific here.

Help Needed with Workflow Optimization in Alteryx Designer & Integration with Splunk

emma_Wilson
6 - Meteoroid

Hello Alteryx Community,

I'm working on a project that involves data cleansing and transformation using Alteryx Designer, with an additional requirement to integrate the processed data into Splunk for further analysis. I've encountered a challenge that I hope you can assist me with.

 

I have a large dataset containing sales data, and I'm trying to optimize a workflow that includes joining multiple tables, filtering specific columns, performing some aggregations, and then sending the transformed data to Splunk. Although the workflow is functional, it's running slower than I'd like, especially during the joining process and Splunk integration.

 

Here's a brief overview of my workflow:

Input Data: Reading data from multiple CSV files.
Joining Tables: Using the Join tool to combine sales data with product and customer information.
Data Cleansing: Removing null values, correcting data types, etc.
Aggregations: Summing sales by region and product category.
Splunk Integration: Sending the cleaned and aggregated data to splunk for further analysis using the Alteryx Splunk Connector.
Output: Writing the final result to an Excel file.

 

Could anyone please provide tips on how to optimize this workflow? Specifically, I'm looking for advice on improving the performance of the joining process and the integration with Splunk. Are there any specific tools or techniques within Alteryx or settings within the Splunk connector that can help in this scenario? If you need more information, please let me know, and I can provide additional details.

 

Thank you in advance for your help! Your insights and experience with both Alteryx and Splunk will be greatly appreciated.

1 REPLY 1
caltang
17 - Castor
17 - Castor

Hi @emma_Wilson - Very succinct explanation and clarity is given, so thank you for that.

 

Before I give the tips below, it is important for you to have the latest version of Alteryx and also to have AMP engine turned on (if you turned it off). Generally, the bulk of your slowness comes from the Joins and Summarize tools.

 

With regards to connection to splunk, have you followed the best practice guide here: https://community.alteryx.com/t5/Engine-Works/Splunk-will-it-Alteryx/ba-p/554043 ?

 

With regards to your request, here are some tips for you:

  • Input Data: Since you are reading multiple CSV files, are you using any Batch Macros or Dynamic Input tools? If they are large in size, you can convert them first into YXDBs then load them in your workflow. With a control container, you can do it in the same workflow. Something like this:

image.png

  • Joins: As mentioned, the bulk of your loads come from Joins and Summarize tools, so when you use a YXDB input, this helps speed up the connection process here as well. Beyond that, I highly recommend checking the keys you are using to Join - make sure they are of the same data type, and to prevent many-to-many connections.
  • Data Cleansing: If you are referring to feature engineering, then the Data Cleanse tool is a powerful tool but note that it is actually stored as a standard macro. I would suggest for you to cleanse your data and perform feature engineering from the start rather than after the Joins.
  • Aggregations: Same as Joins above.
  • Splunk Integration: Now, this may be a bit clunky at times because you are feeding it externally. But if it takes up too much processing power, I would recommend for you to have a chain app that runs after the first workflow does the ETL, then your 2nd workflow kicks off and sends it to Splunk.
  • Output: Same as Splunk. Try not to have any Browse Tools present as well, that takes up processing power.

 

Beyond that, it's using less tools wherever possible - and being dynamic helps.

 

If the above solved your need, kindly like & mark as accepted solution so that you may help others find the solution more quickly + to close the thread as is. Thanks!

Best regards,
Calvin Tang
https://www.linkedin.com/in/calvintangkw/

Calvin Tang
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/
Labels
Top Solution Authors