logAlteryx Designer is a very popular client-side data wrangling tool for Data Scientists and engineers. It also has a server setup for collaboration and scheduling purposes in an enterprise setting. Once Alteryx gets integrated into the mix of other applications ( web, batch, etc.) then an interesting problem arises on how to keep track of data flow, failures, and log management.
Consider a scenario where a Java or Python-based application triggers Alteryx workflows on a server which in turn calls a REST API to persists the data. As you see there is an event or transaction that starts from an application in Java or Python, progresses through Alteryx workflows, and ends up invoking a REST API. There are many considerations as you develop this architecture in relation to the data and process flow through these disparate applications. Questions like the following need to be addressed:
Event Correlation: First consideration is the ability to correlate an event as it follows through these applications. A unique generated Id using one of the Math libraries can be utilized i.e. math.UUID(). In the case of multiple process flows, this UUID can be prefixed with the name of the process/application, see below:
CorrelationId = BalRepAlteryx+Math.UUID() =BalRepAlteryxf56eaf7f‑d8b4‑4aeb‑87a0‑dcbe059339ae
All the log messages across the applications can utilize this format while logging in to Splunk or any other log aggregators. The person doing the investigation can bring up all the messages in chronological order using this Correlation Id(UUID) to see a complete picture of what's going on across the applications.
Handshake: as the processing moves from different applications there is a need to do a proper handshake using logs so that in case of a failure or debugging it is easier to trace. Following are some of the attributes that should be logged on entry and exit :
Based on these attributes some of the following can be answered:
Sample Json data written in logs would look like :
DevOps team can create a consolidated view over multiple Splunk indices for the applications in scope and it can be used to see the event progression end-to-end identified by a correlationId.In this way, Alteryx can be embedded in the overall fabric of existing enterprise applications.