Hello Alteryx Community!
I'm taking a deep dive into training curriculum assignment data trying to identify curricula that are typically assigned together. I am trying to do this by identifying recurring sequences or patterns in my data.
I've gotten my curriculum names converted to unique numeric ids (using Tile), concatenated in one column next to the User ID (using Summarize). I've converted User ID to a numeric record ID so that I can share the data.
Example: Everyone assigned Curriculum 266 is also assigned 275, but not everyone assigned curriculum 275 is assigned 266. I'd like to identify what other curricula are frequently assigned together.
Another Example of some recurrences/patterns, I've highlighted some of the curriculum ids that are commonly occurring together in these 4 rows:
I have a hunch that there is a regex formula out there that will do this, but I wasn't able to find it when I went digging.
Thank you!
Lizzy
Note: I've used a unique tool to identify groups of users that have all the same curricula, but now I am trying to look at recurring sequences within the rows, as opposed to the entire unique sequence for a user.
Solved! Go to Solution.
@AverageLizzy This doesn't seem to me at first glance to be something that RegEx is going to be well suited for. As I look through your data, I note of the 3200 rows and 401 unique curriculum that you will have to basically identify roughly 161K unique pairs in your data and then sort for frequency of matches in your 3200 rows. I could be misunderstanding or missing something, but I think that is the route you will have to take.
I would love to be wrong, but I'm not certain this will be an easy lift. I calculated roughly 655K pairs in your record IDs and I think you would need to cross correlate those to the 161K possible pairs to begin sorting for match frequency. Hoping someone with some serious regression skills can hop on and illuminate an approach for mere mortals. Good luck, hope this helps.
Here's my attempt at getting something useful from this data:
The three browse tools are the three points you could take the data onwards depending on what you want to do. I've matched each course with every other co-occurring course and counted the distinct students for each combination.
The first browse has a simple list showing, for each course, a sorted list of the number of students also taking each other course.
The second browse adds the number of students taking each course and calculates the percentage taking each other course. From either of these two points, you could sample or filter the results to get e.g. top 10 courses also taken by students on this course.
For the third browse I've matched the data to itself again to get both the percentage of course 1 students also taking course 2 and vice versa, then filtered to combinations with both percentages >=75% to show courses that frequently occur together.
I'm certainly not a regression expert and I'm sure others might be able to produce something more robust, but hopefully this gives you something to think about.
Hello mpennington,
Thank you so much for the attempt! I think you are right about regex, it isn't the right fit for this. I'm not able to use this workflow for my current goal but it may come in handy later for some of the other data questions I am trying to answer.
Thank you!
Lizzy
Thank you, Christina!
This is perfect! I'll be utilizing the results from the second browse tool to dive into the data and see what curricula can be merged or more efficiently assigned.
Thank you! Thank you! 😊
@AverageLizzy I found this use case interesting, so, although this is an answered question, still I would like to share my approach.
I simply built an Iterative Macro to run through all combinations in a package by package manner and then checked the frequency of both the courses ( in one combination) occurring at the same time. Once the macro finished running, I sorted combinations based on their frequency of occurring together. This can be further enhanced with other KPIs.
Please note, combination with course numbers 10,57,58 would provide incorrect results, please convert them to 3 digit course numbers like 010,058,057 to get the correct result. Otherwise, when I tried to match string, 10 was matching with 510.
Please share your feedback and let me know if you have any questions.
Arundhuti
Thank you, @Tyro_abc! I've never worked with macros before so I'm excited to give this a shot!
User | Count |
---|---|
19 | |
14 | |
13 | |
9 | |
8 |