community
cancel
Showing results for 
Search instead for 
Did you mean: 
Do you have the skills to make it to the top? Subscribe to our weekly challenges. Try your best to solve the problem, share your solution, and see how others tackled the same problem. We share our answer too.
Weekly Challenge
Do you have the skills to make it to the top? Subscribe to our weekly challenges. Try your best to solve the problem, share your solution, and see how others tackled the same problem. We share our answer too.
Unable to display your progress at this time. Please try again a little later, or contact an administrator if you continue to see this error.
Getting started with Designer? | Start your journey with our new Learning Path!

Challenge #145: SANTALYTICS 2018 - Part 1

Alteryx
Alteryx

@mbarone @mattagone @Pepper YAY ROCHESTER ALTERYX USER GROUP! Thanks for providing a solution!

Fireball
Spoiler

Capture.PNG
Capture2.PNG

Great challenge! 

Here is my solution.  Fun one, as I have been looking for a charity in my home state to take advantage of tax credits this is also very time appropriate!

 

Spoiler
challenge_145_completed.jpg
Highlighted
Meteoroid

I definitely did more work than I needed to, but started by pulling in all the data first then figured out what I needed.

 

Spoiler
2018-12-14 16_38_29-Alteryx Designer x64 - Santalytics Part 1.yxmd_.pngWay too many tools with dead ends2018-12-14 16_39_11-Alteryx Designer x64 - Santalytics Part 1.yxmd_.pngCame up with A Precious Child Inc.

 

Alteryx Partner

My solution:

 

Spoiler
145..png

 

 

My solution for Part 1.

Spoiler
AWC_145.JPG

 

 

Alteryx
Alteryx

I did the same as most, but went for separate filters.

Two reasons:

  1. You can see from the annotation each step that the filter is doing, so don't have to looking into the expresssion.
  2. It's more efficient with the data.
Spoiler
Workflow.png
Bolide
Bolide

That's interesting @JoeS - I'd always assumed a single filter would be more efficient. Just out of interest, why do multiple filters make it more efficient?

Andy 

Alteryx
Alteryx

@andyuttley I believe it's the way that the expression is compiled in the Alteryx Engine.

 

If you have them all in one filter, each record is checked against each condition. So in this example all 300K records are checked against all 6 conditions in one filter.

If you have them in separate filters records are removed from the process in a waterfall way.

 

It's most efficient to have your biggest remover (made up that phrase, but the condition that most records fail) as the first filter and then continue with that approach and then there will be less records for each condition to be checked against.

 

It's not really going to make a difference on only 300k records.

I just did some testing, 22.5 seconds on average for one filter and 21 seconds for 6 filters (in the order in the post, not the most efficient).

Which is quite considerable actually, as its the auto-field that takes most of the time in my workflow (performance profiling says 18.5 seconds).

Although, what is weird is that if you add up the individual times of the performance profiling, you get 145ms for 6 filters vs 117ms for 1 filter.

My guess here is that as the timings are so small, the actual overhead of the performance profiling added to each tool slows it down.

As the workflow consistently runs quicker with 6 filters and no performance profiling.

 

I found this out on processes years back with hundreds of millions of records. So we are talking really small margins with 300k records.

Alteryx Certified Partner

Wooooooooooooooooooooooooooooo!!! Santalytics!!

 

 

nerd.gif

 

Anyway, mine was probably similar to the others.

 

Spoiler
I first downloaded the raw data and the charity list from the links provided and dropped them into temporary files. The .zip file needed a bit of SQL replace magic, but nothing complicated. I then joined them on the EIN field:

Raw Data Download.PNG

I then downloaded the Data Dictionary using the same method and dynamically renamed the columns from the first row of data:

Data Dictionary Download.PNG
Then, I simply used a dynamic rename tool to take the names from the dictionary description, applied the appropriate filters and calculated the Efficiency, sorting accordingly:

Dynamic rename and finalise.PNG

And here are the results:

Results.PNG

* "Patiently" awaits the next instalment