Engine Works

Under the hood of Alteryx: tips, tricks and how-tos.
awrangler
Alteryx Alumni (Retired)

plan-g91fcd0c42_1280.jpg

Source: Pixabay 

 

Continuous performance tuning is an ongoing endeavor as we persistently work on designing, developing, and deploying our flows. We have assembled 3 different complex flow scenarios along with potential solutions. The objective is to streamline our flows, enhance troubleshooting capabilities, and improve overall efficiency. The 3 complex flow scenarios will demonstrate how you can restructure your flows by leveraging the Plans feature to orchestrate your job runs. By exploring these potential solutions, you can discover methods to minimize computational expenses and eliminate redundancy when developing your data pipeline!

 

Scenario 3: Direct Unions from Data Sources Slowing Job Runs

 

image002.png

Diagram 1

 

In our fourth complex flow scenario conceptually depicted above in Diagram 1, there are a couple of different logic pieces present in this flow. Below in Figure 1 is an example image of this flow scenario in DCTC.

 

image004.png

Figure 1

 

These different logic pieces can be broken into smaller components or flows for a smoother and more orchestrated execution. Additionally, breaking up the different logic pieces into one monolithic complex flow will make the logic more easily understandable and ease future troubleshooting.

A possible solution to simplify this complex flow with different logic pieces is to be broken into 3 smaller flows, as shown conceptually below in Diagram 2.

 

image005.png

Diagram 2

 

Flow 1 can involve every logic up to the join recipe, like in Figure 2.

 

image007.png

Figure 2


Flow 2 can involve every logic after the join recipe up to the common transformation recipe by creating a Reference Dataset like in Figure 3.

 

image008.png

Figure 3


And Flow 3 can involve every logic after the common transformation recipe up to publishing by using an intermediate file. The intermediate file would be the output metadata of Flow 2. It would be helpful to leverage parameters in Flow 3 for specifying which metadata to pull in dynamically each run, as shown below in Figure 4.

 

image009.png

Figure 4

 

With those 3 flows containing the different logic pieces, we can leverage the Plans feature to orchestrate the execution conceptually reflected in Diagram 3 below.
 

image010.png

Diagram 3

 

For a more concrete DCTC plan example, see Figure 5 provided here.

 

image012.png

Figure 5

 

Note: You can also use a reference dataset for Flow 1 if you plan to use the first portion of logic separated into Flow 1 for other use cases. As for Flow 2, you can also use an intermediary file instead of rerunning the Flow 2 logic if you do not plan on dynamically replacing the data sources. Using an intermediary file can help reduce computational costs. An intermediary file would be the published output of Flow 2 in this scenario.

Resource(s):

  • See Build Sequence of Datasets documentation for more information on how to chain recipes in the same flow for creating reference objects and imported datasets from outputs.
  • See View of Reference Datasets documentation for more information on creating and adding reference datasets to another flow.
  • See References Page documentation for more information.
  • See Plans documentation for a general overview.
  • See Plans Page documentation for more information on the Plans page.
  • See Create a Plan documentation for more information on creating a plan.

 

Scenario 2: Redundancy in Flows Sharing Same Initial Logic Slowing Development

 

image013.pngDiagram 4

 

In our fifth complex flow scenario depicted below in Diagram 4, there are multiple flows that share the same initial logic piece but differ downstream with customer-specific transformations.

 

Below in Figure 6 is an example image of this flow scenario in DCTC.

 

image015.png

Figure 6

 

Running 3 flows in this scenario presents an opportunity to save on computational costs by reducing the number of flows and an opportunity to ease troubleshooting. By easing the troubleshooting process, less time is spent in development.

As shown conceptually below in Diagram 5, a possible solution would be to separate the shared initial logic piece in all 3 flows here into Flow 1.

 

image016.png

Diagram 5

 

And have the different downstream logic pieces with customer-specific transformations into Flow 2, reducing our 3 flows down to 2 flows like in Figure 7.

 

image018.png

Figure 7

 

Here in Flow 2, like in Figure 8, leveraging parameters would be helpful for specifying which metadata to pull in dynamically for each run.

 

image019.png

Figure 8


Additionally, we can leverage the Plans feature to orchestrate the execution conceptually reflected in Diagram 6 below.

 

image020.png

Diagram 6

 

For a more concrete DCTC plan example, see Figure 9 provided here.

 

image022.png

Figure 9

 

Scenario 3: Redundant Manual Unions from Data Sources Slowing Development

 

image023.png

Diagram 7

 

In our sixth complex flow scenario conceptually depicted above in Diagram 7, there are various complex unions between data sources with different table schemas. Below in Figure 10 is an example image of this flow scenario in DCTC.

 

image025.png

Figure 10

 

Each union between a pair of data sources differs in the number of columns and what data are present in each column. When replacing data sources or troubleshooting, it may be frustrating navigating different logic pieces in our monolithic complex flow. Less time will be spent in development If we can ease the ease troubleshooting process. Here we have 3 complex unions in our complex flow that we want to split organize.

As shown conceptually below in Diagram 8, a possible solution would be to break up the 3 complex unions into Flow 1, Flow 2, and Flow 3 based on their respective table schema.

 

image026.png

Diagram 8

 

Flow 4 can involve every logic from the complex unions up to publishing. So, a more concrete DCTC flow example of Flow 1 would look like this image in Figure 11.

 

image028.png

Figure 11

 

A more concrete DCTC flow example of Flow 2 would look like this image in Figure 12.

 

image029.png

 Figure 12

 

As a more concrete DCTC flow example of Flow 3, here is Figure 13.

 

image030.png

Figure 13

 

Finally, here is a more concrete DCTC flow example of Flow 4 provided below as Figure 14.

 

image031.png

Figure 14

 

In Flow 4, leveraging parameters would be helpful for specifying which metadata to pull in dynamically for each run.


Then we can leverage the plans feature to orchestrate the execution conceptually reflected in Diagram 9 below.

 

image032.png

Diagram 9

 

Our slowest flow, Flow 3 in this scenario, is the last flow task to successfully execute before starting Flow 4. For a more concrete DCTC plan example, see Figure 15 provided here.

 

image034.png

Figure 15

Comments