Showing results for 
Search instead for 
Did you mean: 

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.

Counting hashtag combinations




I'm cooking up a little visualisation project and am at a slight roadblock:

So, i have a data dump of a number of messages (somewhere up in the 7 digit-range) with hashtags embedded

- Amy, don't have a #cow! #farmlife

- I love #Farmlife #vacation #midwest
- Cows & cows & #cow, as far as the eye can see #farmlife #midwest


Now i'd want to count occurences of all hashtag combinations to see which clusters emerge
the 3 messages above would then have the following combinations

tweet 1   
tweet 2   
tweet 3   


after summarizing this would become:

All combos   


sounds like the right input for a non-directional network graph to visualize how these hashtags are related and clustered, which are often combined, and which are not

I'm still in the contemplation+ parsing out invalid characters and cleaning stage at this point ( human-written messages and csv format apparently do not play well together!)

but I have a few minor issues / concerns:


how to extract the pairs?
messages have anywhere between 1 and 8 hashtags, so anywhere between 1 and 28 combinations per tweet (
based on the formula: 



 r! (n - r )!

I haven't got much of a clue how to handle this specific dynamic in Alteryx yet,
any of you ever had a similar problem?

I think there might be an iterative or batch macro in here, one that

- takes a single tweet
- extracts all hashtags and orders them alphabetically
- somehow builds an array of all combinations <<-- ( this is my main problem)
-outputs this into a table with 2 columns ( or summarizes them)
- goes to the next tweet repeats, and appends this result to the previous one,
until all tweets are finished


How to de-duplicate:
( there is no directionality!)


to become



( I suspect, that once I get to this point, alphabetically sorting the hashtags before summarizing them may very well clear up this problem, but maybe the pairs should be aphabetically sorted at the extraction stage? I do not know if there is a simple way to do this kind of cross-column compare (and replace?) )


Also haven't figured out yet what to do if someone uses a single hashtag twice ( there shouldn't be 2 nodes with the same hashtag/label, so i guess I could integrate a filter to filter out based on an expression like [column1#tag] != [column2#tag].


Thought i'd post it here, seems like an interesting conundrum, I'm figuring out a lot of the specifics right now by writing it up here
sadly, i cannot share the original dataset, so I hope it's ok to include a little part of the dataset that was used in weekly challenge #89 and #90 ( Analyzing social data)
as it already has similar data and the hashtags have already been split off.

Any 2cents or pointers for an approach that could work would be #hugely welcome


Give this one a try:


The top input is your data, the bottom is adding a delimiter so I could keep the #'s.  How I handled getting the unique pairs was to join the data back to itself, then removing records where the two hashtags are the same.  I think you're spot on with alphabetizing the hashtags prior to counting them.


Hope it helps!


Thanks @danrh !

a very clean and elegant solution

I was making it much harder then it was supposed to be,
joining the data back onto itself to get the combbinations: Great idea, never thought of that.

i knew i was missing something thinking about an iterative or batch macro to do this!
This is probably A LOT faster then iteratively cycling through every row/tweet to extract the tags, ( 600K rows in the base dataset, that could take a while :) )
the adapted flow now blazes throuugh  the entire data set in sth like 15-23 seconds!