Engine Works

Under the hood of Alteryx: tips, tricks and how-tos.
RithiS
Alteryx
Alteryx

"You're creating algorithms with Alteryx" was a comment made by a software engineering colleague recently. She was learning Alteryx Designer with a Weekly Challenge. Her statement helped me realize that the reason Alteryx is powerful is because algorithms are powerful. And that's why users love our tools so much. Basically, we help users that are domain experts create business logic, data pipelines, and algorithms to solve problems across all industries.

 

We have essentially transformed our users into engineers. It's just that they are using Alteryx. They've developed and productionized complex algorithms with less overhead than with a programming language. Philip Riggs has a great blog post about doing most of his work in Alteryx over Python.

 

Democratizing the creation of algorithms helps democratize data. So how has Alteryx democratized the creation of algorithms? We've abstracted layers making it easy to manipulate data types, data structures, inputs/outputs, grouping data, combining data, and much more. This lets our users focus on their area of expertise instead of the overhead required in coding. Because it's easy to manipulate and configure a workflow, users can change and adjust their business logic quicker than in code. Our users get the advantage of code without having to learn how.

 

Alteryx is much easier to learn than coding. Users that can code will leverage Alteryx much better with the Python tool, R tool, and developer SDKs. Learning to code is a great way to get better at Alteryx, but that's a topic for another day.

 

I created a couple simple algorithms by completing a Weekly Challenge, which are really just coding problems. Many people including Alteryx ACE and the Inspire 2018 Grand Prix winner @NicoleJohnson are regular participants. I tackled Challenge #155 with Python and Alteryx Designer. Here's the original post by @TerryT:

 

We've received the following repeated transmission, but our communications channel is just so unreliable!
We think if we analyze each string for the most frequent character in each position we can reconstruct the message.

 

Yesterday we received the following:

Htl2!
ce+lo
ve8lz
HDlcF
u8pho

We reasoned that since we received 'H' in the first column the most times, and 'e' in the second column the most times, we think the greeting "Hello" was sent.

 

Can you help?

 

It's straightforward. Get a count of each character within a column to find the most common character. The most common character in each column creates the message. I used the Python tool to write the code, which is an integration of Jupyter Notebook that I love so very much. We can assume every string is the same length. Therefore, we can use each character's index as the column index in our logic.

 

Below is the short code in its entirety. It doesn't have comments, but a file with comments is attached.

 

from ayx import Alteryx
from collections import Counter
import pandas df = Alteryx.read('#1') counter_list = [] def count_char(string: str): for i, char in enumerate(string): try: counter_list[i][char] += 1 except: counter_list.append(Counter()) counter_list[i][char] += 1 df['data'].apply(count_char) decoded_message = '' for counter in counter_list: decoded_message += counter.most_common(1)[0][0] decoded_message = pandas.DataFrame({ 'decoded_message': [decoded_message]}) Alteryx.write(decoded_message, 1)

 

 Let's do a quick review of the code.

 

Alteryx.read('#1') reads the incoming data stream from anchor #1. counter_list is an empty list that will store the Counter objects used to track the frequency of each character. An example of Counters is shown later.
 
from ayx import Alteryx
from collections import Counter
import pandas

df = Alteryx.read('#1')
counter_list = []
 
This is what df looks like when first read:df_before_list.PNG

 

Here is the main part of the algorithm. With Python, I can iterate over each character in the string. As each string is iterated, I add it to the Counter object that matches the index of other characters in other strings. A Counter object is created if one doesn't exist. The function count_char does all that (not to be confused with All That).

 

After defining the function, it is applied to every record in the DataFrame and will populate counter_list.

 
def count_char(string: str):
    for i, char in enumerate(string):
        try:
            counter_list[i][char] += 1
        except:
            counter_list.append(Counter())
            counter_list[i][char] += 1

df['data'].apply(count_char)

 

After applying the function, counter_list appears as so:

 

counter_list.PNG

 

I have the character counts for each column, so the most common character must be identified. The list of counters is iterated and the most_common method identifies the character for the message. It's stored in a string variable, which is then converted to a DataFrame for an output anchor.

 

decoded_message = ''
for counter in counter_list:
    decoded_message += counter.most_common(1)[0][0]
    
decoded_message = pandas.DataFrame({ 'decoded_message': [decoded_message]} )
Alteryx.write(decoded_message, 1)

 

And that's it for Python! The decoded message matches what is expected:decoded_hello.PNG

 

Here is the Alteryx version of my solution. I won't dive into the workflow, but it is attached for your curiosity. You can also view the different solutions in the challenge thread as there are over 8 pages of algorithms and discussion. I won't share the message here but you can review my attachment or attempt the challenge yourself.

 

completed_challenge.PNG

 

What's the probability that each of my algorithms can be improved? It's 100%. That's the beauty of algorithms. A found solution can be optimized. The optimization process provides a lot of opportunity to learn and increase value. Optimization may not matter with small datasets, but it matters immensely if you're processing hundreds of millions and billions of records.

 

One way to show others the power of Alteryx is to work on coding problems like the Weekly Challenge. Take a series of problems and complete them using Alteryx and your language of choice. The difference between the overhead of each will be significant.

Rithi Son
Product Manager

Rithi started at Alteryx in March 2016 as a product engineer before becoming a product manager in 2019. He has worked as a business and data analyst in ecommerce and health care business intelligence utilizing Excel and SQL. Rithi lives in Denver enjoying life in the Colorado front range.

Rithi started at Alteryx in March 2016 as a product engineer before becoming a product manager in 2019. He has worked as a business and data analyst in ecommerce and health care business intelligence utilizing Excel and SQL. Rithi lives in Denver enjoying life in the Colorado front range.