Free Trial

Weekly Challenges

Solve the challenge, share your solution and summit the ranks of our Community!

Also available in | Français | Português | Español | 日本語
IDEAS WANTED

Want to get involved? We're always looking for ideas and content for Weekly Challenges.

SUBMIT YOUR IDEA

Challenge #56: Parsing and Counting Hashtags

JoeM
Alteryx Alumni (Retired)

 

Last week's solution can be found HERE.

 

 

This week's challenge, by its title, sounds like a simple task. However, a quick look at the data will show that it's not going to be just a summarize tool. If you have been meaning to finally learn the RegEx tool, this exercise will be a great first foray with a relatively simple expression. In this exercise, we will take a set of very ugly data, parse it to find hashtags within a text field, find out how many times the hashtag was written, and what users used the hashtag.

 

If you would like to learn more about our RegEx Tool - check out our help on the tool. If you would like a simpler RegEx overview, check this interactive learning out.

 

Good luck!

 

 

 

 

 

mceleavey
17 - Castor
17 - Castor

I'm not sure the results provided are correct as they show a #followme result against id_str_13, but that text does not contain a hashtag, unless I'm missing something?

 

That aside, my solution is as follows:

 

Spoiler
I first created a RecordID field, then used Regex to parse the text field using the following:

#.*?\>

This splits the text by a hashtag and the end of the attached word. I used split to rows to provide a row per hashtag. I then filtered out the null rows, summarised and provided a count of the hashtags per id_str and used crosstab to pivot the data.
I then replaced nulls with zeros using the "Data Cleansing" tool and created the total.

Solution.PNG


results.PNG




Bulien

MarqueeCrew
20 - Arcturus
20 - Arcturus

Of course I wanted to solve this without the use of RegEx (just being difficult).  I don't agree that id_str_13 should have any counts.

 

Spoiler
Screen Shot 2017-02-06 at 12.18.02 PM.png
Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
JoeM
Alteryx Alumni (Retired)

Good catch @mceleavey and @MarqueeCrew, I have updated the start file to have the correct expected results.

Max06270
7 - Meteor

Nice challenge, have a great week guys!

 

Spoiler
Capture.PNG

 

 

Harbinger
9 - Comet

 

 

Spoiler
CompletedWorkbook.PNG

 

 


I used \#[a-zA-Z\d]+ to grab and tokenize into rows. Biggest difference is I am using the multi-field formula tool to rename and to return binary results for hashtag/string flags. this is in contrast to a formula and dynamic rename. As always, there's more than one way to shine a penny!

I have attached my work for a deep dive-- fun stuff! 

 

jury_maggay
5 - Atom

Hello,

 

I would like to know how the expression #.*?\> works. I was also trying to solve the challenge using RegEx tool but can't seem to figure out the correct expression to use. Appreciate your feedback on this :-)

mceleavey
17 - Castor
17 - Castor

In regex the .*? part simply means "take everything in between". From the dropdown you'll see a list of Regex tools that allow you to then construct things like that seen in the example.

The # here, followed by the .*? simply means take everything from the # symbol to...

I then used the \> which means "the end of the word".

So, in speech terms, it literally translates to "take everything from the # symbol to the end of the word".

 

Regex really isn't as scary as it first appears. I myself had no Regex experience before using Alteryx but it's really easy once you get your head around it.

 

Try experimenting with the dropdown list:

 

screenshot.PNG

 

Feel free to message me if you have any questions or if you have a specific issue with which you're struggling.

 



Bulien

jury_maggay
5 - Atom

Thank you so much for the explanation. I will surely reach out to you guys here in the forum should I need help on anything about Alteryx. Thanks

Harbinger
9 - Comet

Also, a great reference for experimentation is regexr.com. You can paste your own test text in the text box and get real-time feedback on how your regular expression evaluates on that text.