Hi all,
Currently have a dataset that looks something like:
Location 1 | Location 2 | Location 3 | ... | Location N | |
Response 1 | Never been | Never been | Been once | ... | Been once |
Response 2 | Never been | Been many times | Been once | ... | Never been |
Response 3 | ... | ... | ... | ... | ... |
... | ... | ... | ... | ... | ... |
Response M | ... | ... | ... | ... | ... |
I want to eventually generate a network view where nodes represent locations and edges represent someone having been to both locations at least once. Nodes will be sized by number of times a respondent has been at least once, and edges will be weighted by the number of times respondents have been to both locations at least once.
The roundabout way of generating this is to use Formula to generate new columns, e.g. column "Edge 1-2" will be created using:
IF Contains([Location 1], "Never", 1) OR Contains([Location 2], "Never", 1) THEN 0 ELSE 1 ENDIF
I can then use the summarize tool to get the weights.
However, I have 15 locations, which makes 105 columns, which is a lot of work and not particularly efficient.
Can anyone think of a way to do this more quickly/smoothly in Alteryx? I'm new to the program and not certain of its capabilities yet. If I can't, I can code it up in Python quite easily, but I'd rather develop a fully Alteryx model for continuity and clarity.
Thanks!
Solved! Go to Solution.
Hi,
Really quick question regarding this - you're basically describing that for location 1, it will have 14 "edges", those being:
Edge 1-2
Edge 1-3
Edge 1-4
Edge ...
Edge 1-15
And that you will then create that same dataset for each point?
What I'm wondering is, are you looking for feedback to make writing this more dynamic/scalable, or feedback on an alternative method of building this out?
Assuming you're looking for a methodology to simply build all 105 edges dynamically, I built out an example (attached).
Basically, using the transpose and cross tab tools you can build these dynamically using only two formulas instead of 105.
You have to do some filtering and joining which probably slows this down if we're talking about millions of records, but it should be a fairly useful starting point, at the least.
The biggest advantage to this is that if you add or take away a location, you won't have to adjust the workflow at all.
Hi,
Thanks for your response! Yes, Node 1 could have up to 15 edges from it (Edge 1-2, 1-3, 1-4, ..., 1-15); then Node 2 could have the same (2-3, 2-4, ...), continued through Node 15. It would be a visual representation of how often people who go to one location go to another.
I'd like to make it more scalable, since I have to find the weight of each edge for the 15 nodes (15 choose 2 possibilities).
Thanks!
This is amazing! Thank you so much -- I think this is exactly what I was looking for!