Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Question Re: Weekly Challenge 377 & the Sample Tool

tbrook207
6 - Meteoroid

Pardon what may be a simple question, but in regards to a recent weekly challenge (Challenge #377: Video Game Sales Insights - Alteryx Community).  I was able to work out most of the solution on my own, but I couldn't mirror the expected results.  Looking into the posted solutions I'm seeing a lot that use the sample tool and I don't understand why it would be a good idea.  If anyone has completed this challenge, and used the tool; would you mind explaining to me what it's purpose is in the workflow?  Below is my current flow that doesn't have the sample tool in play and is giving incorrect information. 

workflow.PNG

7 REPLIES 7
Luke_C
17 - Castor
17 - Castor

Hi @tbrook207 

 

Here's my solution - in my case the sample tool is configured to take the first row for each year. Since it's after the sort it gets the record with the highest sales in each year.

 

 

Capture.PNG

 

 

acarter881
12 - Quasar

Hello, @tbrook207.

 

The objective is to find the number of records where the maximum sales, by publisher, by year equal the maximum game titles by year.

 

Before my Sample tool, I have the data sorted by year, ascending and sales, descending. 

 

Within the Sample tool, I group by year and return the first row from each year.

 

Where the publisher's count of game titles equals the maximum count of game titles that year, the publisher who made the most game titles also had the most sales (this is true because the sales were sorted from high to low).

ArnaldoSandoval
12 - Quasar

Hi @tbrook207 

 

Let's address your question first:

 

  • Why Do we use the Sample Tool in this solution? 

The Sample Tool is a filter that allows us to reduce the data stream by selecting records by an aggregated attribute, it behaves like the HAVING in SQL SQL HAVING Clause; perhaps it got an unfortunated name: Sample that could be misleading Sample 

 

The challenge wants us to find the Publisher with most global sales per year, the best suited tool to find them is the Sampel Tool, we feed the tool with the data stream pre-sorted by Year, Total Global Sales and Unique Games Released; then we configured the Sample tool to pick the First N Rows, grouped by Year, e.g. it will return one record per year with the Publisher having more sales in that year.

 

We do likewise with the Unique Games Released (because the challenge also wants us to find the publisher with more Unique Games release in a year), we sort a second data stream, this time by Year, Unique Games Released and Total Global Sales, with this data stream the Sample tool picks the First N Rows grouped by Year, in the same way, it return one record per year per Publisher having more games released.

 

That's the reason the workflow looks like this:

Sample-Tool-v01.png

Hope this helps,

Arnaldo

 

jdminton
12 - Quasar

Great explanation @ArnaldoSandoval . I do think that it is named appropriately though since the main function of the tool is to be able to take a sample of records. Auditors would use this type of tool to select a certain number or percent of records from various categories. Also, it can be utilized in workflows to take a sample of data to use to test the workflow while developing without having to process the full dataset. 

 

@tbrook207 As far as the sample tool is concerned for this challenge, it isn't required. See the workflow attached here to see a solution without the sample tool. (I would also post an image, but I haven't been able to do that lately for some reason).

ArnaldoSandoval
12 - Quasar

Thanks @jdminton for the feedback,

 

The terminology "Sample" involve randomness, which I cannot see the Sample tool doing. It slices the data more like the SQL Having clause; Well, I was trying to write a little about the Sample Tool as @tbrook207 asked.

 

Arnaldo

jdminton
12 - Quasar

I do agree with your comparison to the having clause, but the term "sample" doesn't imply randomness alone. This is why there is the term "random sample". Sampling is just a selection from a group. While random sampling is probably the one most people hear, there are 5 different types of sampling. As an example, check out this link to see different types of sampling methods used for statistical purposes: https://people.richland.edu/james/lecture/m170/ch01-not.html#:~:text=There%20are%20five%20types%20of....

tbrook207
6 - Meteoroid

Thank you everyone for the answers, I'm still in a bit of a fog on why it's used for the solution (or one variation of the solution) but I have a bit of a better understanding.  Much appreciated my friends!

 

@ArnadoSandoval & @jdminton, you two are completely correct on my understanding of "sample".  Coming from the PM space, whenever we would look for samples (materials, bandwidth verifications, completed projects, etc.) it's always implied that it would be a random pull, otherwise we would ask for particular resources to review.  I wouldn't think that sample vs. random sample would be different things, but I understand that they absolutely can be.  So cheers you two for that.

Labels