Hi everyone!
I have some random sample problems to solve, while it's really hard for me to share the use case and data, I made up this following case that would eventually fit what I need:
Attached is the Top Spotify songs data from Kaggle. Top Spotify songs from 2010-2019 - BY YEAR | Kaggle
I want to make a workflow to random select a playlist that meets all the criteria below:
1. The whole playlist is as close to 75 mins the better (data is in seconds under "dur" in the dataset)
2. "dance pop" is not more than 25% of the playtime
3. "artist" are not to repeat so every person has equal chance
4. the playlist should cover songs from at least 5 different years
Anyone may know how to solve it?
Thanks!🙏
Here you go, this should work (hopefully haha).
I made a iterative macro to search for your required playlist. Within the macro, the first thing i did was just give every song with "dance pop" a 1 in 3 chance of being selected for dataprocessing at all. The problem is that 54% of all songs are within that genre so you have to cut out some songs before further processing. Let me know if this works out for you!
Greetings,
Seb
Thanks so much for the fast feedback and much appreciated for your help!! 😀
I'm just the (very) beginner with macro so it may take some time for me to study and digest 🙂
A few questions here:
1. Is the reason using macro so the workflow can iterate until it spits out a fit answer? (looks like when I just execute the workflow itself it yields different results i.e. 0 playlist)
2. In my real life case that "dance pop" % of overall population is unfortunately changing every time I run it; pretending I'm given a "monthly top playing songs" every single month (instead of yearly) and I'll need to run this workflow every month after a month-end. So we don't know how many more dance pop songs are coming to the population each month and therefore wouldn't know % each month at month-end. What would be the best way to manage this portion and make it automation? Earlier I was trying to use the random % tool but not sure how to either.
3. While I need more time to study the workflow/ micro, I'm curious this portion of the filter:
[top genre] = "dance pop"
AND
[Duration per genre]/[Total duration] > 0.25
Why we then use the "True" result instead of False (but yet when I ran multiple times the result obviously fits the criteria, which is less than 25% of the playtime? 😆)
May I send you private message to ask further?
Thanks!
Hi! no problem, glad to help out haha 🙂
Let's see for all of your questions, here are my answers:
1. Yep, you need something to iterate (a macro in Alteryx) to search for the answer as a lot of combinations wouldn't end up fitting your requirements.
2. I made a new version for you that should help you out with this. The first thing you should do is run the workflow without the anchor attached to the macro. In the browse tool you'll find the % of dance pop songs in your entire dataset. From that you can determine the N (from 1 in N) for the sampling tool, you can now give up the N by clicking on the macro (see picture 2). So, basically, it's adjustable per dataset you feed it :-). With some studying you should be able for the challenge to change the macro from including years (as was your first requirement) to months (i leave that up to you as a challenge haha).
3. I made 3 threads in the macro that just check if your requirements are met (see picture 3).
the first one checks if there are 5 years included in the selected set of songs (if not it returns a value > 0 in the count tool)
the second one checks if the duration of dance pop songs in total is over 25% of the total duration of the selected set (if it is the case the count tool returns a 1).
the third thread checks if artists are included only once in the selected set, if not, the count tool will return 1 for every artist that is represented multiple times.
Sure you can send me a private message!
Just to be sure, i made this macro quite fast, you can even make it that it would determine the % size of the pop dance genre in the total set by itself and than let it determine the N of the sample tool by itself based on that, it would just take some more work but you could do that.
Greetings,
Seb
picture 2
picture 3