Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Data Science

Machine learning & data science for beginners and experts alike.
roni-1
5 - Atom

 

image4.pngBefore Alteryx BUILD

 

Before attending Alteryx Inspire 2018, Ryan Andrew and I (Royden Onishi) had no idea what to expect. We had never met each other until the Monday morning of Alteryx Inspire 2018.

 

In a non-deterministic fashion, Ryan and I sat next to each other in the Hilton’s Ballroom A for the Alteryx BUILD hackathon. Having just met, we quickly learned that our skill sets complemented each other’s perfectly. I have a background in computer science with experience in Python, and Ryan had more experience in frontend design and Alteryx’s built-in capabilities.

 

Despite this being our first-time meeting, this end to end knowledge helped us work extremely well together. Even personality wise, it could not have worked out better. There were times I wanted to push the envelope by adding extra features, but Ryan helped ground me to prevent scope creep!

 

A large part of the hackathon was focused on brainstorming ideas. Deep learning has been getting a lot of attention in recent years. Andrew Ng is a rockstar in the AI and deep learning community, and he has a popular deep learning specialization on Coursera. Through those courses, we saw the potential to incorporate deep learning into Alteryx using the new Python SDK. With this new Alteryx feature, we decided to build a tool called the Image Vectorizer. This tool takes a directory of images and vectorizes them for potential image processing in Alteryx. We used the AYX Tool Scaffolder and the new Alteryx Python SDK in the development of this tool.

 

Image Vectorizer Tool

 

What the Image Vectorizer tool does is fairly simple. It takes a directory of images, and normalizes the image size and decodes the image into vectors. For this tool, normalization is simply resizing all the images to have the same dimensions. The parameter “n” specifies how the images are resized. For example, if the user selects the value 100 for “n”, the tool will resize all images to 100x100 pixels. This assumes that all the images have the same aspect ratio. The value “n” is determined by the architecture of the neural networks. This allows for the user to tweak performance with accuracy. The closer the image is to its original pixels, more information is captured, but the more time it will take to process the information, and vice versa. For example, if an image is 500x500 pixels, and you do not lose too much information by resizing it to 100x100, then it might be advantageous to resize the image to increase performance when training your neural networks.

 

The image is then decoded and converted into a vector, which is just a list of numbers. The images are RGBA encoded, each pixel has four features:

 

  1. Red Intensity
  2. Green Intensity
  3. Blue Intensity
  4. Alpha Channel (Opaqueness of Pixel)

The tool concatenates the pixel features into a vector as follows (only RGB shown, alpha channel was omitted):

 

23

49

10

21

...

102

21

94

12

...

201

23

0

12

...

 

Each feature is ordered by pixel in a left to right, top to bottom fashion.

 

Pixel #

1

2

3

4

...

1

2

3

4

...

1

2

3

4

...

Value

23

49

10

21

...

102

21

94

12

...

201

23

0

12

...

 

If you include a CSV file with image filenames and labels, the tool will pair the vectors with the appropriate label. These labels can be used for the classification of images. Inspired by Andrew Ng’s deep learning specialization, we wanted to use the output of the Image Vectorizer tool to classify images. The original goal was to determine if it was a cat image or not. This led us to our team name “Cats Rule, Dogs Drool”.

 

AYX Tool Scaffolder

 

For the BUILD Hackathon, the developers at Alteryx made available the new AYX Tool Scaffolder. In four easy steps, it allows a user to set the metadata, specify incoming and outgoing connections, design the GUI interface, import the tool into Alteryx and generate all the framework files necessary to create a Python, Javascript or Macro tool.

 

Soooo convenient and easy...Soooo convenient and easy...

AYX Tool Scaffolder is very intuitive and straightforward. In no time, we were able to create the base files to build the Image Vectorizer Tool in Alteryx.

 

Alteryx Python SDK

 

The tool utilizes the Pillow (a forked version of the Python Image Library) and NumPy libraries. The Pillow library allows for the image to be normalized to a designated size, decodes the image pixel information, and concatenates the pixel values into a NumPy array. Here is a snippet of the function used in Python:

 

def process_file_path(self, filepath, n):
   # Outputs info to Alteryx log.
   self.parent.alteryx_engine.output_message(self.parent.n_tool_id,\    
       Sdk.EngineMessageType.info, filepath)
   # Uses Pillow to decode the image.
   image = Image.open(filepath).convert('RGBA')
   # Normalizes the size of the image.
   norm_image = image.resize((n, n))

   # Uses NumPy to convert the decode image to a vector.
   return np.array(norm_image).reshape(1, n*n*4)[0]

 

It was our first time working with the Python SDK. Luckily there were a lot of engineers from Alteryx ready to guide us through the process. Michael Chadwick very patiently sat down with us and explained how the Python SDK works, and walked us through the code line by line. JP Kabler and William Thompson explained how to import dependencies and how to organize the YXI file to package the Python tool. Being able to talk to the engineers that actually developed the Python SDK made attending the BUILD hackathon very valuable. We learned a lot from the engineers; in turn, we could bring that knowledge back to our respective companies to create some really valuable tools.

 

Limitations and Improvements

 

The Image Vectorizer tool outputs each feature as a column in Alteryx. For example, if an “n” value of 300 was chosen, then there would be a max of 360002 (300*300*4 + 2) columns in the output. The 2 corresponds to the filename and file label, and the 4 corresponds to the 4 features per pixel in RGBA encoding. The sheer number of columns causes a significant slowdown in Alteryx. A possible improvement would be to encode the vector in a single column in a way that can be easily parsed. The tool only supports decoding in RGBA. An improvement could be to allow the user to decode in different formats such as grayscale for dimensionality reduction. Another improvement would be to give the user the ability to normalize the pixel values of each image. This would entail finding the mean of the pixel values of the image, subtracting all the values by the mean and dividing all the resulting values by the standard deviation. Normalizing the pixel values in this way would make training the neural networks easier.

 

Future Development

 

The time allotted at the Alteryx BUILD was limited, and there were a lot of ideas we wanted to implement but did not have the time to do. Given unlimited time, we would have liked to implement the YOLO algorithm into Alteryx using the Python SDK. The YOLO algorithm performs object detection in images using deep learning. More information on the YOLO algorithm can be found here.

 

Example of YOLO output… that is a nice tie…Example of YOLO output… that is a nice tie…

With the Python SDK, we could really do anything in terms of development in Alteryx, which is what makes the Python SDK so cool! There could be an opportunity to apply pre-trained neural networks to make available different abilities such as character recognition, voice to text, object detection, etc. With the Python SDK the possibilities are limitless.

 

So powerful...So powerful...

Final Thoughts

 

Ryan and I both had a blast at the Alteryx BUILD hackathon. Alteryx BUILD brought together Alteryx users and pushed the boundaries of what the platform is capable of. The friendliness, passion and enthusiasm of those who attended the conference was contagious. Alteryx Inspire 2018 left us motivated to solve big problems, and create tools and models with significant impact. We look forward to what the future holds for machine learning and data science and are excited to see Alteryx evolve as part of that movement. We can’t wait to compete again at Inspire 2019 in Nashville, TN!

 

Pictured (L-R): Royden Onishi, Ryan Andrew and Tasha Alfano (Product Manager for Alteryx Developer Tools)Pictured (L-R): Royden Onishi, Ryan Andrew and Tasha Alfano (Product Manager for Alteryx Developer Tools)

Comments