Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Weekly Challenges

Solve the challenge, share your solution and summit the ranks of our Community!

Also available in | Français | Português | Español | 日本語
IDEAS WANTED

Want to get involved? We're always looking for ideas and content for Weekly Challenges.

SUBMIT YOUR IDEA

Challenge #56: Parsing and Counting Hashtags

JoeM
Alteryx Alumni (Retired)

 

Last week's solution can be found HERE.

 

 

This week's challenge, by its title, sounds like a simple task. However, a quick look at the data will show that it's not going to be just a summarize tool. If you have been meaning to finally learn the RegEx tool, this exercise will be a great first foray with a relatively simple expression. In this exercise, we will take a set of very ugly data, parse it to find hashtags within a text field, find out how many times the hashtag was written, and what users used the hashtag.

 

If you would like to learn more about our RegEx Tool - check out our help on the tool. If you would like a simpler RegEx overview, check this interactive learning out.

 

Good luck!

 

 

 

 

 

mceleavey
17 - Castor
17 - Castor

I'm not sure the results provided are correct as they show a #followme result against id_str_13, but that text does not contain a hashtag, unless I'm missing something?

 

That aside, my solution is as follows:

 

Spoiler
I first created a RecordID field, then used Regex to parse the text field using the following:

#.*?\>

This splits the text by a hashtag and the end of the attached word. I used split to rows to provide a row per hashtag. I then filtered out the null rows, summarised and provided a count of the hashtags per id_str and used crosstab to pivot the data.
I then replaced nulls with zeros using the "Data Cleansing" tool and created the total.

Solution.PNG


results.PNG




Bulien

MarqueeCrew
20 - Arcturus
20 - Arcturus

Of course I wanted to solve this without the use of RegEx (just being difficult).  I don't agree that id_str_13 should have any counts.

 

Spoiler
Screen Shot 2017-02-06 at 12.18.02 PM.png
Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
JoeM
Alteryx Alumni (Retired)

Good catch @mceleavey and @MarqueeCrew, I have updated the start file to have the correct expected results.

Max06270
7 - Meteor

Nice challenge, have a great week guys!

 

Spoiler
Capture.PNG

 

 

Harbinger
9 - Comet

 

 

Spoiler
CompletedWorkbook.PNG

 

 


I used \#[a-zA-Z\d]+ to grab and tokenize into rows. Biggest difference is I am using the multi-field formula tool to rename and to return binary results for hashtag/string flags. this is in contrast to a formula and dynamic rename. As always, there's more than one way to shine a penny!

I have attached my work for a deep dive-- fun stuff! 

 

jury_maggay
5 - Atom

Hello,

 

I would like to know how the expression #.*?\> works. I was also trying to solve the challenge using RegEx tool but can't seem to figure out the correct expression to use. Appreciate your feedback on this :-)

mceleavey
17 - Castor
17 - Castor

In regex the .*? part simply means "take everything in between". From the dropdown you'll see a list of Regex tools that allow you to then construct things like that seen in the example.

The # here, followed by the .*? simply means take everything from the # symbol to...

I then used the \> which means "the end of the word".

So, in speech terms, it literally translates to "take everything from the # symbol to the end of the word".

 

Regex really isn't as scary as it first appears. I myself had no Regex experience before using Alteryx but it's really easy once you get your head around it.

 

Try experimenting with the dropdown list:

 

screenshot.PNG

 

Feel free to message me if you have any questions or if you have a specific issue with which you're struggling.

 



Bulien

jury_maggay
5 - Atom

Thank you so much for the explanation. I will surely reach out to you guys here in the forum should I need help on anything about Alteryx. Thanks

Harbinger
9 - Comet

Also, a great reference for experimentation is regexr.com. You can paste your own test text in the text box and get real-time feedback on how your regular expression evaluates on that text.