Engine Works

Under the hood of Alteryx: tips, tricks and how-tos.
jdunkerley79
ACE Emeritus
ACE Emeritus

abacus1.png

 

The Abacus add in is a collection of custom functions that make writing expressions in Alteryx easier or gives completely new functionality (such as variables). Originally created to make my life easier, they have grown into a powerful extension for Alteryx that can speed up creation of complicated formulae. They focus in on a few key themes:

 

  • Date and Time
  • Probability Distributions
  • Data Generation
  • General Utility
  • Variables (new in v1.4)

In general, the functions have come either from my own wants for easier ways to do things or more commonly from ideas when answering questions in the community. This document gives an overview of the collection as it stands at version 1.4 (which should be released very soon).

 

Please always remember when using any SDK tool or custom function in a workflow, that the add in will then need to be installed on any machine you wish to run the workflow on.

 

Installation

 

installation1.png

 

The best way to install the add in is to use the Analytic App Installer. This app should download from GitHub, then extract and install the add in into Alteryx. If it doesn't work, you can get the direct download following the instructions below.

 

The newest release is always available from GitHub. In each release you will find the following kind of structure:

 

installation2.png

 

The two zip files contain the actual add in (AlteryxAbacus.v.1.3.0.0.zip) and the set of test workflows (AlteryxAbacus.v.1.3.0.0.Tests.zip). The Manual.pdf contains a snapshot of the documentation wiki produced when the release was packaged. Finally, the source code is included as both a zip and a tar.gz file.

 

To install onto Alteryx Designer (or Server), you just need to download the main zip file. You can then extract it and run Install.bat. This should install the functions into Alteryx. To uninstall, run Uninstall.bat. These scripts have been tested on Windows 10 but will need administrator access to run. There is also a couple of Windows 7 compatible scripts (Install Win7.bat, Uninstall Win7.bat) included in the zip file.

 

Date and Time

 

These functions make working with and manipulating dates easier. They break into a couple of collections:

 

DatePart: Gets part of a date or time

 

These return the specified part as a number. Some of the functions now have built-in equivalents.

 

  • DatePart(interval, dt): Replicates the SQL DatePart function, getting a specified part of the datetime input
  • Century(dt): Gets the century for a date
  • Year(dt): Gets the four-digit year
  • Quarter(dt): Gets the quarter of the date [1-4]
  • Month(dt): Gets the month number for a date or datetime [1-12]
  • Day(dt): Get the day of the month for a date or datetime [1-31]
  • OrdinalDay(dt): Gets the day of the year [1-366]
  • WeekNum(dt): Gets the week number of a Datetime, with Sunday as start of week and January 1st in Week 1 [1-53]
  • Weekday(dt): Gets the day of the week [Sunday (0) through to Saturday (6)]
  • Hour(dt): Get the hour part of a DateTime or Time [0-23]
  • Minute(dt): Gets the minute part of a DateTime or Time [0-59]
  • Second(dt): Gets the second part of a DateTime or Time [0-59]

 

Period Start and End: Gets the beginning or end of specified periods

 

It is often useful to be able to round a date down or up to the period start or end. These functions provide those capabilities:

 

  • YearStart(dt): Get First Day of Year
  • YearEnd(dt): Get Last Day of Year
  • QuarterStart(dt): Get First Day of Quarter
  • QuarterEnd(dt): Get Last Day of Quarter
  • MonthStart(dt): Get First Day of Month
  • MonthEnd(dt): Get Last Day of Month
  • WeekStart(dt): Get first Sunday before or equal to date
  • WeekEnd(dt): Get first Saturday after or equal to date

 

Parse and Create: Shorthand to make dates from strings or numbers

 

These are some quick functions for building dates either from strings, numbers, or other datetimes:

 

  • DateFromDMY(DMY): Parse a string in Day Month Year format to a Date (copes without leading 0s and different separators)
  • DateFromMDY(MDY): Parse a string in Month Day Year format to a Date (copes without leading 0s and different separators)
  • MakeDate(Year, Month, Day): Create a new date from Year, Month, Day
  • MakeDateTime(Year, Month, Day, Hour, Minute, Second): Create a new DateTime from Year, Month, Day, Hour, Minute, Second
  • MakeTime(Hour, Minute, Second): Create a new time from Hour, Minute, Second
  • ToDate(dt): Truncate a DateTime to a Date
  • ToDateTime(dt): Appends midnight to a Date to create a DateTime
  • ToTime(dt): Get the time from a DateTime, Date (defaults to 00:00:00) or a Time.

 

Other Date Functions: Business days and other bits and bobs

 

Last few functions handle working day calculations and any other miscellaneous date tools I needed:

 

  • DateAdd(Date,i,u): Equivalent to DateTimeAdd but returning a Date
  • BusinessDays(StartDate, EndDate): Number of weekdays between two dates
  • Workday(dt,days): Add or remove a specified number of weekdays to a date (similar to Excel's Workday function but no support for holidays)
  • IsLeapYear(Year): Is a year a leap year (takes a year not a date as an input use IsLearYear(Year([Date]))

 

Probability Distributions

 

Based on the Boost library, these functions allow computation of values on various probability curves:

 

  • ChiDist(X, DegreesOfFreedom): Compute CDF on Chi Squared distribution
  • ChiInv(P, DegreesOfFreedom): Compute inverse CDF on Chi Squared distribution
  • LogNormDist(X, Location, Scale, Cuml): Compute PDF or CDF on Log Normal distribution
  • LogNormInv(P, Location, Scale): Compute inverse CDF on Log Normal distribution
  • NormDist(X, Mean, StDev, Cuml): Compute PDF or CDF on Normal distribution
  • NormInv(X, Mean, StDev): Compute inverse CDF on Normal distribution
  • TDist(X, DegreesOfFreedom): Compute two tailed Student T distribution
  • TInv(X, DegreesOfFreedom): Compute inverse two tailed Student T distribution
  • RandTriangular(Prob, Min, Mode, Max): Produces a random number from a triangular distribution

 

Data Generation

 

These functions help with the generation of fake data.

 

  • RandomIPAddress(CIDR): Generates a random IP address, can be restricted to be within a CIDR block
  • RandomItem(Item1, Item2, Item3): Picks a random item from the input list (can be list of numbers or list of strings but must be one type)
  • RandomString(Pattern, CharSet1, ...): Generates a random string using a template to control format

 

General Utility

 

These functions don't really fit in any other category.

 

Handling Nulls

 

Functions to handle dealing with NULL easier.

 

  • Coalesce(Val1, Val, ... 😞 Given list of values return first non null value (C++)
  • IfNull(Val1, Val2): If first value is null return second

 

Geometry

 

  • Deg(Radians): Convert radians to degrees
  • Rad(Degrees): Convert degrees to radians
  • HexBinX(X, Y, R): Given an X,Y point and optional radius, get X co-ordinate of hexagonal bin's centre
  • HexBinY(X, Y, R): Given an X,Y point and optional radius get Y co-ordinate of hexagonal bin's centre

 

Aggregration Across Columns

 

  • Avg(Value1, Value2, ...): Average of a list of values ignoring NULL
  • Count(Value1, Value2, ...): Count of a list of values ignoring NULL
  • Sum(Val1, Val, ... 😞 Sum of a list of values ignoring NULL

 

Roman Numerals

 

A couple of functions to convert to and from Roman numerals (based on an idea from Ken Black)

 

  • ToRoman(Value): Given a number between 0 and 5000, convert to Roman numerals
  • FromRoman(Value): Given Roman numeral convert to a number

 

RangeJoin

 

This is a specialist function to allow for a simple VLookup style join. If you take a list of values in ascending order and join together into a comma separated string, then this function will return first value equal to or above the input value.

 

  • RangeJoin(Value, RangeCSV): Finds the first value in the RangeCSV parameter which is greater than or equal to the Value argument.

 

General Mathematical Functions

 

  • Int(Value): Rounds a number to nearest integer either equal to the number or closer to 0.
  • Modulo(Dividend, Divisor): General Double Based Modulo function
  • Quotient(Dividend, Divisor): Returns the integer part of a division
  • Sign(Value): Determines the sign of a number (-1 if less than 0, 1 if greater than, 0 otherwise)
  • Phi(): Returns the golden ratio constant.

 

General String Functions

 

  • FindStringLast(String, Target): Gets the position from the left of the last instance of a term in a string
  • LeftPart(String, Separator): Gets the text before the first instance of a separator
  • RightPart(String, Separator): Gets the text after the first instance of a separator
  • Split(String, Delimeter, Index): Splits a string into tokens and then returns the specified instance

 

Developer Functions

 

These are quite specialised but make development of new functions or advanced formulae easier.

 

  • Version(): Gets the major and minor version of the Alteryx Engine as a number
  • LogToFile(ReturnValue, FileName, Message): Write a message to a log file
  • ReportError(Condition, Message, ReturnValue): Raise an error from a formula if a condition is met, otherwise return a specified value

 

Variables

 

Variables are brand new in version 1.4 and add a whole new dimension of capabilities within formula tool. You can do things like implement a Newton Raphson solver using the generate rows tool or create a grouped running total in a formula tool without resorting the data. They are very experimental so please use with caution. Documentation on these new functions is also not yet complete.

 

Variables come in 3 types:

 

  • Number (stored as a double but can be read from any numeric type)
  • String (stored as a Unicode value but any text or datetime field as input)
  • List (a resizable array of values allowing addition, removal and random access)

 

Variables exist for the execution of a workflow and can be passed between tools. I have not done sufficient experimentation with macros to know how they play with these. The list of functions for working with variables is:

 

  • VarPrint(): Lists all current variable names and values to a string table.
  • VarReset(): Clears the internal caches of all variables. Can take a Key parameter to delete just that variable.
  • VarNum(Key[, Value]): Retrieves (if no Value argument passed) or stores a number value in the Key variable.
  • VarNumExists(Key): Returns true if a number variable exists, false otherwise.
  • VarText(Key[, Value]): Retrieves (if no Value argument passed) or stores a string value in the Key variable.
  • VarTextExists(Key): Returns true if a string variable exists, false otherwise.

 

Lists

 

Interacting with a list of values requires some more functions. You need the ability to check the list exists, get the length, access a value, set a value, and delete a value. Indexes start from 0.

 

  • VarListExists(Key): Returns true if a list variable exists, false otherwise.
  • VarListLength(Key): Returns the current length of a list if it exists, or NULL if the list does not exist.
  • VarListAdd(Key, Value): Adds an item to a list variable (creating a new list if needed) at the end of the list.
  • VarListInsert(Key, Index, Value): Inserts an item into a list (creating a new list if needed) at specified index. If Index is negative accesses from the end of the array backwards.
  • VarListSet(Key, Index, Value): Sets an item in an existing list at specified index. If Index is negative accesses from the end of the array backwards.
  • VarListRemove(Key, Index): Removes an item from a list variable at specified index. If Index is negative accesses from the end of the array backwards.
  • VarListGet(Key, Index): Gets a value from a list variable at the specified index. If Index is negative accesses from the end of the array backwards.

 

An Overview of the Code Behind

 

This whole project is open source and if you would like to contribute, I would value the submission. All of the code is open source and available on GitHub, below is a quick overview of the code and set up I use to develop the functions.

 

These functions are built on top the Custom Function SDK. There are two ways to create a new function - either in XML or in C++. The code is structured as follows:

 

├── AlteryxAbacus
│   ├── AlteryxAbacus.aps
│   ├── AlteryxAbacus.cpp
│   ├── AlteryxAbacus.h
│   ├── AlteryxAbacus.rc
│   ├── AlteryxAbacusUtils.cpp
│   ├── AlteryxAbacusUtils.h
│   ├── AlteryxAbacus.vcxproj
│   ├── AlteryxAbacus.vcxproj.user
│   ├── ChiSquaredDistribution.cpp
│   ├── DateTimeFunctions.cpp
│   ├── EngineVersion.cpp
│   ├── EngineVersion.h
│   ├── Generator.cpp
│   ├── HexBins.cpp
│   ├── LogNormalDistribution.cpp
│   ├── NormalDistribution.cpp
│   ├── resource.h
│   ├── RomanNumerals.cpp
│   ├── stdafx.cpp
│   ├── stdafx.h
│   └── StudentTDistribution.cpp
├── AlteryxAbacus.sln
├── AlteryxAbacus.sln.DotSettings.user
├── DateUtils.xml
├── MathUtils.xml
├── MiscUtils.xml
└── StringUtils.xml

 

The XML files contain the function definitions (both for the macro functions and the C++ based ones). They are separated by category. The C++ code is all contained in a single Visual Studio project. In general, I again keep each set of functionalities in its own cpp file making them easier to handle.

 

In terms of tooling to work on these, you will need a reasonable text editor (I would recommend Visual Studio Code) and a copy of Visual Studio to work on the C++ code (Community edition is fine). In addition, I use the boost library (currently I use v1.67) which you will need to download and reference for the C++ code.

 

Testing

 

testing1.png

 

Testing is crucial when creating functions. I use the CReW macros to provide the List Runner macro. The following scripts and macros are in the repository to allow local testing:

 

├── Install - Debug.bat
├── Install - Release.bat
├── LinkDebug.ps1
├── CreateSet.png
├── CreateSet.yxmc
├── ResultCompare.yxmc
├── RunUnitTests.ps1
└── RunUnitTests.yxmd

 

The Install - Debug.bat, Install - Release.bat and LinkDebug.ps1 will either install the output of a debug or release build into Alteryx. The LinkDebug script establishes a symbolic link allowing quick modification while developing. The RunUnitTests.ps1 script will run the RunUnitTests.yxmd workflow in all installed instances using the AlteryxEngineCmd.exe. This workflow seeks all workflows in folders ending with .Test in the folder name. In general, I keep a test workflow for each function in a folder named after the XML file containing the functions. For example, to test the StringUtils:

 

├── StringUtils.Test
│   ├── ContainsTest.yxmd
│   ├── EndsWithTest.yxmd
│   ├── FindStringLastTest.bak
│   ├── FindStringLastTest.yxmd
│   ├── FromRomanTest.yxmd
│   ├── LeftPartTest.yxmd
│   ├── RightPartTest.yxmd
│   ├── SplitTest.yxmd
│   ├── StartsWithTest.yxmd
│   └── ToRomanTest.yxmd
├── StringUtils.xml

 

Documentation and Test Coverage

 

workflow.png

 

A new addition in version 1.4 is a new workflow Coverage.yxmd which performs various static checks on the library. It looks for the following issues:

 

  • The name is in upper case
  • The text inserted matches the name of the function
  • If an XML function, fixed number of parameters and that all are used
  • If a C++ function, check entry point matches name
  • A test workflow for the function exists
  • An entry exists in the function list
  • A documentation entry has been created in the wiki

This workflow uses a lot of convention to ensure that the add in is well-tested and documented. By using this technique, I have found various issues that I had missed in previous versions. While I expect some work would be needed, the Coverage.yxmd workflow could easily be adjusted to check any custom function add in.

 

Packaging

 

The last of the project consists of tools for packaging and releasing:

 

├── CreateRelease.ps1
└── vswhere.exe

 

These scripts are responsible for creating a new release package. This will locate Visual Studio and run a build of the C++ code in Release mode, before using the local testing set up to install and run the unit tests. Assuming this is successful, it will then create a zip file containing all the XML files and the compiled DLL file and the following installation scripts:

 

├── Install.bat
├── Installer.ps1
├── Install Win7.bat
├── Uninstall.bat
├── Uninstaller.ps1
├── Uninstall Win7.bat

 

These scripts allow for manual installation if the analytic app fails. The final steps of creating a release is to package up the test workflows into their own zip file and to create a PDF document based on the wiki pages (this is done using Pandoc).

 

Wrapping Up

 

Hopefully these functions are helpful, and the notes here give you some details on how to use them and how they have been built and tested. If you have any issues or questions, reach out to me on the community (@jdunkerley79) or raise an issue on GitHub.

 

Likewise, if you would like to contribute with new functions or code corrections, I will welcome the Pull Requests!