Engine Works

Under the hood of Alteryx: tips, tricks and how-tos.
AZuc
Alteryx
Alteryx

I’ve recently had to use Alteryx to solve a case that was previously solved only by enabling Iterative Calculation in Excel, once the case generated a circular reference formula.

 

What exactly is a circular reference?

 

The Use Case

Let’s explain with an example. The case below is simpler version of the original problem, based on tax calculations in a country in the southern hemisphere.

 

Known value fields are:

 

Sales Value or Gross Value: Final amount in the invoice, taxes included.

 

Reference Value: Amount established by the government based on price research, used to trigger a new tax in some cases.

 

Tax1%: First Tax percent applied to Net Value.

 

Tax2%: Second Tax percent applied to Net Value.

 

We need to calculate other fields:

 

Tax1 Amt: Tax1% * Net Value.

 

Tax2 Amt: Tax2% * Net Value.

 

Tax3 Amt: (Reference Value * Tax1%) - Tax1 Amt

 

Net Value: Gross Value - Tax1 Amt – Tax2 Amt – Tax3 Amt

 

Our goal is to know Net Value, that is the basis to perform other calculations. But it’s not possible to calculate this without knowing Tax3 Amt. This one needs Tax1 Amt and, to close the circle, the last one needs Net Value to be calculated. We’ve reached the circular expression condition.

 

Fields in blue mean known values.Fields in blue mean known values.

 

Solution Ideas

Talking to a teammate, we came to the conclusion that we should use an iteractive macro, pinning one of the variables on each iteration (in our case, Net Value), and validating whether the calculated gross value was equal to the actual Gross Value or not.  If not, we would increment (or decrement) Net Value, until all the calculations reached the expected Gross Value.

 

The new problem is determining how to increment. The first idea was to increment cent per cent, until the calculation reaches Gross Value. Considering rounding, we’ve built the first version of the workflow, attached (First WF.yxmd). As the first “guess” to Net Value, I’ve considered the half of Gross Value.

 

In the first execution, there was an “iteration limit reached” error. Raising the number of iterations to 10000 and adding #iterations (based on the constant [Engine.IterationNumber]) figured out how many tests would be needed to reach the correct value.

 

My first test was with 1 record only with the following values:

 

  • Gross Value: 100
  • Reference Value: 110
  • Tax1%: 0.3
  • Tax2%: 0.16
  • Net Value (the first “guess”): 50 (half of Gross Value).

 

The workflow took 4.4 seconds to run, with 776 iterations.

 

New Issue

Now that we’ve succeeded, I went to the customer to validate some assumptions. There were two pieces of info to worry about: the volume of data to be processed is around 500 thousand records; and the precision must be at the 4th decimal place.

 

Testing the same macro with one record only but adapted to have the precision on the 4th decimal place, the workflow took 4:23 minutes and iterated 77587 times. The result, although correct for that one record, became unfeasible to 500 thousand calculations.

 

 

giphy-DESPAIR.gif

 

The Solution

We then started to look for other approaches to reduce the number of iterations and remembered an approach from years ago, still in the programming language times (it was VBA for Excel, btw), named binary search.

 

The basics about this idea consist in searching new values always half of the distance of the extremities. Again, let’s take an example to explain.

 

Assuming that your values vary between 0 and 100, and you are looking for 70. In the first iteration we seek the average between the maximum and the minimum, in our case (100+0)/2, that is 50. Fifty is higher or lower than 70? Lower! Then the new minimum is 50. Let’s detail each step of the logic below:

 

 

Iteration

Min

Max

Average

Average < 70?

Action

1

0

100

50

Yes

New Min=50

2

50

100

75

No

New Max=75

3

50

75

62.50

Yes

New Min=62.50

4

62.50

75

68.75

Yes

New Min=68.75

5

68.75

75

71.88

No

New Max=71.88

6

68.75

71.88

70.32

No

New Max=70.32

7

68.75

70.32

69.54

Yes

New Min=69.54

8

69.54

70.32

69.93

No

New Min=69.93

9

69.93

70.32

70.13

No

New Max=70.13

10

69.93

70.13

70.03

No

New Max=70.03

11

69.93

70.03

69.53

Yes

New Min=69.78

12

69.53

70.03

69.78

No

New Min=69.78

13

69.78

70.03

69.91

No

New Min=69.91

14

69.91

70.03

69.97

No

New Min=69.97

15

69.97

70.03

70

EQUAL

STOP

 

 

Fifteen iterations! A lot, isn’t it? But observe how the distance to the goal reduces aggressively each iteration:

 

 

Iteration

Average - Target

Difference

1

50 - 70

-20

2

75 - 70

+5

3

62.50 - 70

-7.5

4

68.75 - 70

-1.25

5

71.88 - 70

+1.88

6

70.32 - 70

+0.32

7

69.54 - 70

-0.46

8

69.93 - 70

-0.07

9

70.13 - 70

+0.13

10

70.03 - 70

+0.03

11

69.53 - 70

-0.47

12

69.78 - 70

-0.22

13

69.91 - 70

-0.09

14

69.97 - 70

-0.03

15

70.00 - 70

0

 

 

Compared to 776 iterations with precision on the 2nd decimal place, 15 looks a reasonable amount. Now let’s put this workflow in practice and check the results.

 

I’ve made a slight adjustment to the logic. Instead of working with maximums and minimums, I decided to set the new Net Value to be tested as Net Value plus half the difference between entire actual gross value and calculated gross value.

 

In the first test (2 decimal places for precision), our workflow took 1.7 seconds and iterated 6 times until reaching the result. Much better than the 776 original iterations.

 

Then I tested with precision on 4th decimal place. The workflow took 1.1 seconds, with 8 iterations instead of 77.587. The final workflow is attached (Final WF.yxmd).

 

My final test was adding 500,000 random records into both approaches. Let`s see the results:

 

First WF: 3:03 hours

Final WF: 9.8 seconds

 

giphy-OK.gif

 

Circular references are a topic much larger than the case we saw here. This case could be solved with an iterative macro, others may require different approaches. The (adapted) binary search was key to make the case feasible.

 

I'd like to know about other use cases where this technique is useful - who is up for that challenge?  Please reply to this thread with your suggestions!