by Daniel Kübler, Linn von Pein (@LvP), Merve Tas Bangert (@mtasbangert)
The Problem
You want to use advanced mathematical methods or special statistics in Alteryx without doing it “by hand,“ for example, calculating a regression function or its parameters. Of course, it is possible, as shown by the community, with a few tools. But if you want to go deeper in statistics and ANOVA or ANCOVA calculations, it is not that easy to use fast mathematical methods like the multiplication of matrices with the given tools.
Alteryx has possibilities to extend the broad range of base tools with more advanced tools that bring us the capability of interfacing professional programming languages e.g. Gnus R or Python.
Is it too much hardcore programming for the average user? Possibly. But let’s talk about combining the best of both worlds.
First, you need to know what’s in your data and how it looks. This easy-to-do task is done by exploring your data with the basic and common Alteryx tools like the Browse tool or the Data Cleansing tool prior to the Browse tool.
Maybe you want to go a bit more into detail and use some filters or formulas to calculate something by hand. Or you eventually want to use a Summarize tool in combination with your formulas to go even deeper.
So far, so good. Now we know our data – at least, we think.
Let’s go deeper down the rabbit hole.
Source: Tenor
For example, your data shows you a connection between plant growth, watering, and lighting. Logic tells us that if we give the plant enough water and light, it will grow better. Sure.
But what is more important or beneficial for the plant? Or what is the best ratio between light and water?
For this, we need to use an ANOVA or better an ANCOVA statistical analysis.
It stands for ANalysis Of VAriances or ANalysis of COVAriences. I don’t want to go into detail on the statistics of this calculations, but they use multiplication of matrices, as mentioned above, to calculate the effects between growth, light and water in our example.
You can read more here and here.
If you know your math and statistics it’s not more than 3 short lines of code in Python, so easy peasy lemon squeezy.
But if you want to do it by yourself you have to put much more effort (and of course brains) in it to reference and calculate the right things with the right things e.g. in Excel or--more horribly--by hand.
Source: GIPHY
The Solution: Developer Tools
… and deeper….
Source: GIPHY
Let’s talk about a common problem that each one of us has.
I found some old medicine lying around wildly in the house. What should I do with it? Can I still use it? After looking up the expiration date, you know what to do.
But how do pharmaceutical companies calculate their expiration date?
They use a shelf life calculation, which is actually an ANCOVA calculation.
So, from this point on, we now know:
- What we want to calculate → the shelf life
- What data/data structure we need → batch information, timepoints and measured concentration of analytes.
How can we build this rather complex statistical approach using Alteryx Designer?
Let’s dive into it as if it were a recipe:
First connect your data source to your Alteryx workflow and select your specific columns and your analysis parameters.
Second, use data prep to prepare the dataset to the right structure to analyze it (cleansing, restructuring etc.)
Third, join tables and check again the consistency of all collected information for the analysis.
Fourth, put the cleaned and prepared data into the Python tool, add a bit of statistics, matplotlib, and pandas to it (this is where the magic happens).
Stir thoroughly and pour it into a reporting mold.
Et voilà!
We have successfully created our stability analysis workflow in Alteryx.
Source: GIPHY
Conclusion
We have shown that Alteryx Designer can do some deep statistical calculations with a Python tool and how we can build a complete stability workflow for usage in the pharmaceutical industry. We have also shown the capabilities of Designer for data preparation and reporting.
Alteryx is a powerful analysis software capable of using many built-in tools and interface technology to connect other capable analysis tools and databases.
The lively community often helps to find good solutions and to think outside the box.
And if one can do some programming, the possibilities are endless in Alteryx Designer.