Engine Works

jdunkerley79 · ‎06-02-2020

As a bit of a thought experiment, I wondered how hard it would be to create a cubic spline interpolation within Alteryx. As with many of my games, BaseA rules apply. (Editor's note: Definitely read the BaseA article - it's an explanation of the insane rules that the ACEs created to ramp up puzzle solving.)

Stealing an idea from @TashaA, I thought I would do it in both Python and Alteryx from first principles. A quick shout out to MathAPI, a handy site used to render all the LaTeX to SVG.

So let's start by reviewing how to create a cubic spline and then build it up. I chose to use the algorithm as described in Wikiversity, specifically with type II simple boundary conditions. I'm not going through the maths but will define the steps to build the spline.

Building a Tridiagonal Matrix and Vector

The first step is, given an X array and a Y array of equal length n (greater than 2), we want to build a tridiagonal matrix which we will then solve to produce the coefficients for the piece-wise spline. The goal of the spline is that it hits every point (x, y) and that the first and second derivatives match at these points too.

Sticking with the notation in the paper, let's define H to be an n-1 length array of the differences in X:

for

A tridiagonal matrix is a square matrix where all values except for the main diagonal and first diagonals below and above this. For example:

1   2   0   0
2   3   2   0
0   2   3   2
0   0   2   1

One advantage of a tridiagonal matrix is that they are fairly straightforward to invert and solve linear equations based on them. For the sake of coding up the algorithm - let's define B to be the n length array holding the diagonal elements, A to be the n-1 length array of the diagonal above this and C to be the n-1 length array of the diagonal below:

b0   c0    0    0
a0   b1   c1    0
 0   a1   b2   c2
 0    0   a2   b3

Please note the indexes here are different from those used in the Wikiversity article, as they align with a standard array starting at 0. For the spline, these are arrays given by:

for

for

Using the simple boundary condition that the second derivative is equal to 0 at the end, gives the values for c₀ and a_n-2 both equal to 0. This can easily be coded up in Python:

from typing import Tuple, List

def compute_changes(x: List[float]) -> List[float]:
    return [x[i+1] - x[i] for i in range(len(x) - 1)]

def create_tridiagonalmatrix(n: int, h: List[float]) -> Tuple[List[float], List[float], List[float]]:
    A = [h[i] / (h[i] + h[i + 1]) for i in range(n - 2)] + [0]
    B = [2] * n
    C = [0] + [h[i + 1] / (h[i] + h[i + 1]) for i in range(n - 2)]
    return A, B, C

The next step is to compute the right-hand side of the equation. This will be an array of length n. For notation, let's call this D (the same as in the Wikiversity article):

for

Implementing this in Python looks like:

def create_target(n: int, h: List[float], y: List[float]):
    return [0] + [6 * ((y[i + 1] - y[i]) / h[i] - (y[i] - y[i - 1]) / h[i - 1]) / (h[i] + h[i-1]) for i in range(1, n - 1)] + [0]

Solving the Tridiagonal Equation

To solve a tridiagonal system, you can use the Thomas Algorithm. Mapping this onto the terminology above, we first derive length n vectors C' and D':

for

Having worked out C' and D', calculate the result vector X:

for

The implementation of this in Python is shown below:

def solve_tridiagonalsystem(A: List[float], B: List[float], C: List[float],  List[float]):
    c_p = C + [0]
    d_p = [0] * len(B)
    X = [0] * len(B)

    c_p[0] = C[0] / B[0]
    d_p[0] = D[0] / B[0]
    for i in range(1, len(B)):
        c_p[i] = c_p[i] / (B[i] - c_p[i - 1] * A[i - 1])
        d_p[i] = (D[i] - d_p[i - 1] * A[i - 1]) / (B[i] - c_p[i - 1] * A[i - 1])

    X[-1] = d_p[-1]
    for i in range(len(B) - 2, -1, -1):
        X[i] = d_p[i] - c_p[i] * X[i + 1]

    return X

Calculating the Coefficients

So the last step is to convert this into a set of cubic curves. To find the value of the spline at the point x, you want to find j such that x_j < x < x_j+1. Let's define z as:

z has the property of being 0 when x = x_j and 1 when x = x_j+1. The value of spline at x, S(x) is:

Now to put it all together and create a function to build the spline coefficients. The final part needed is to create a closure and wrap it up as a function that will find j and then evaluate the spline. There is an excellent library in Python calledbisectthat will do a binary search to find j easily and quickly.

The code below implements this and also validates the input arrays:

def compute_spline(x: List[float], y: List[float]):
    n = len(x)
    if n < 3:
        raise ValueError('Too short an array')
    if n != len(y):
        raise ValueError('Array lengths are different')

    h = compute_changes(x)
    if any(v < 0 for v in h):
        raise ValueError('X must be strictly increasing')

    A, B, C = create_tridiagonalmatrix(n, h)
    D = create_target(n, h, y)

    M = solve_tridiagonalsystem(A, B, C, D)

    coefficients = [[(M[i+1]-M[i])*h[i]*h[i]/6, M[i]*h[i]*h[i]/2, (y[i+1] - y[i] - (M[i+1]+2*M[i])*h[i]*h[i]/6), y[i]] for i in range(n-1)]

    def spline(val):
        idx = min(bisect.bisect(x, val)-1, n-2)
        z = (val - x[idx]) / h[idx]
        C = coefficients[idx]
        return (((C[0] * z) + C[1]) * z + C[2]) * z + C[3]

    return spline

The complete Python code is available as a gist.

Testing the Spline

As always, it is essential to test to make sure all is working:

import matplotlib.pyplot as plt

test_x = [0,1,2,3]
test_y = [0,0.5,2,1.5]
spline = compute_spline(test_x, test_y)

for i, x in enumerate(test_x):
    assert abs(test_y[i] - spline(x)) < 1e-8, f'Error at {x}, {test_y[i]}'

x_vals = [v / 10 for v in range(0, 50, 1)]
y_vals = [spline(y) for y in x_vals]

plt.plot(x_vals, y_vals)

The above creates a small spline and ensures that the fitted y values match at the input points. Finally, it plots the results:

Recreating in Alteryx

So that's the full process, and now to rebuild it in Alteryx using BaseA. For the input, the macro takes two separate inputs - a table of KnownXs and KnownYs and a list of target Xs. Again, the first task is to build H, A, B, C, D from the inputs:

Using some Multi-Row Formula tools, it is fairly easy to create these. The expressions are shown below. In all cases the value is a Double, and NULL is used for rows that don't exist:

H=[X]-[Row-1:X]
A=IIF(ISNULL([Row+1:H]),0,[H]/([H]+[Row+1:H]))
C=IIF(ISNULL([Row-1:H]),0,[H]/([H]+[Row-1:H]))
B=2

Then I use a Join (on row position) and a Union to add the last row to the set. Finally, D is given by:

IIF(IsNull([Row-1:X]) or IsNull([Row+1:X]),
    0,
    6 * (([Row+1:Y]-[Y])/[H] - ([Y]-[Row-1:Y])/[Row-1:H]) / ([H]+[Row-1:H])
)

Now to solve the produced system. In order to save on storage, instead of creating C' and D', the Multi-Row Formula tools update the values of C and D:

C=IIF(ISNULL([Row-1:X]),[C]/[B],IIF(ISNULL([Row+1:X]),0,[C]/([B]-[Row-1:C]*[Row-1:A])))
D=IIF(ISNULL([Row-1:X]),[D]/[B],([D]-[Row-1:D]*[Row-1:A])/([B]-[Row-1:C]*[Row-1:A]))

To compute the solution vector, M, it is necessary to reverse the direction of the data. While you can use Row+1 to access the next row in a Multi-Row Formula tool, it won't allow a complete full computation backwards. To do this, add a Record ID and then sort the data on it into descending order. After that, M can be calculated using another multi-row formula:

M=IIF(IsNull([Row-1:X]),[D],[D]-[C]*[Row-1:M])

After reverting the sort, we now have all the inputs and can move to compute the coefficients for each piece of the spline:

One small trick here is to skip the first row and join back to the same stream. This gives x, y and M for the next row. The coefficients are then computed using a normal formula tool:

CubeCoefficient=([EndM]-[M])*[H]*[H]/6
SquareCoefficient=[M]*[H]*[H]/2
LinearCoefficient=([EndY]-[Y]-([EndM]+2*[M])*[H]*[H]/6)
Constant=[Y]

The final challenge is to reproduce thebisectfunctionality to find the row for each wanted X.

In this case, using a Multiple Join allows creating a full-outer join of known and wanted X values. After this, add the first KnownX at the top and use a Multi-Row Formula to fill in the gaps.

The last step is just to compute the spline. First, join to get the coefficients and then use a standard Formula tool to calculate the fitted value.

Having wrapped it up as a macro, do a quick test to see it worked:

The final macro and test workflow can be downloaded from Dropbox (macro, workflow).

Wrapping Up

Hopefully, this gives you some insight into how to build a cubic spline and also how to move some concepts from programming straight into Alteryx.

Ken_Black · ‎06-04-2020

James,

I previously wrote a comment for you on this article, but unfortunately it disappeared before I could send it! I wanted to tell you that you took me back to the late 1980's with this article. I remember the first time I ever used natural cubic splines. I wrote the code using Numerical Recipes and when it was finished, the results were beautiful! I fell in love with natural cubic splines at that time and have never forgotten about them.

Congrats on another great article. I really appreciate your work!

Ken

One of my go-to books