Weekly Challenge

Solve the challenge, share your solution and summit the ranks of our Community!
IDEAS WANTED

We're actively looking for ideas on how to improve Weekly Challenges and would love to hear what you think!

Submit Feedback
We've recently made an accessibility improvement to the community and therefore posts without any content are no longer allowed. Please use the spoiler feature or add a short message in the message body in order to submit your weekly challenge.

Challenge #56: Parsing and Counting Hashtags

Highlighted
Alteryx Community Team
Alteryx Community Team

 

Last week's solution can be found HERE.

 

 

This week's challenge, by its title, sounds like a simple task. However, a quick look at the data will show that it's not going to be just a summarize tool. If you have been meaning to finally learn the RegEx tool, this exercise will be a great first foray with a relatively simple expression. In this exercise, we will take a set of very ugly data, parse it to find hashtags within a text field, find out how many times the hashtag was written, and what users used the hashtag.

 

If you would like to learn more about our RegEx Tool - check out our help on the tool. If you would like a simpler RegEx overview, check this interactive learning out.

 

Good luck!

 

 

 

 

 

Highlighted
Alteryx Certified Partner

I'm not sure the results provided are correct as they show a #followme result against id_str_13, but that text does not contain a hashtag, unless I'm missing something?

 

That aside, my solution is as follows:

 

Spoiler
I first created a RecordID field, then used Regex to parse the text field using the following:

#.*?\>

This splits the text by a hashtag and the end of the attached word. I used split to rows to provide a row per hashtag. I then filtered out the null rows, summarised and provided a count of the hashtags per id_str and used crosstab to pivot the data.
I then replaced nulls with zeros using the "Data Cleansing" tool and created the total.

Solution.PNG


results.PNG


Highlighted
Alteryx Certified Partner
Alteryx Certified Partner

Of course I wanted to solve this without the use of RegEx (just being difficult).  I don't agree that id_str_13 should have any counts.

 

Spoiler
Screen Shot 2017-02-06 at 12.18.02 PM.png
Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and reboot. Order shall return.
Highlighted
Alteryx Community Team
Alteryx Community Team

Good catch @mceleavey and @MarqueeCrew, I have updated the start file to have the correct expected results.

Highlighted
7 - Meteor

Nice challenge, have a great week guys!

 

Spoiler
Capture.PNG

 

 

Highlighted
Alteryx Certified Partner

 

 

Spoiler
CompletedWorkbook.PNG

 

 


I used \#[a-zA-Z\d]+ to grab and tokenize into rows. Biggest difference is I am using the multi-field formula tool to rename and to return binary results for hashtag/string flags. this is in contrast to a formula and dynamic rename. As always, there's more than one way to shine a penny!

I have attached my work for a deep dive-- fun stuff! 

 

Highlighted
5 - Atom

Hello,

 

I would like to know how the expression #.*?\> works. I was also trying to solve the challenge using RegEx tool but can't seem to figure out the correct expression to use. Appreciate your feedback on this :-)

Highlighted
Alteryx Certified Partner

In regex the .*? part simply means "take everything in between". From the dropdown you'll see a list of Regex tools that allow you to then construct things like that seen in the example.

The # here, followed by the .*? simply means take everything from the # symbol to...

I then used the \> which means "the end of the word".

So, in speech terms, it literally translates to "take everything from the # symbol to the end of the word".

 

Regex really isn't as scary as it first appears. I myself had no Regex experience before using Alteryx but it's really easy once you get your head around it.

 

Try experimenting with the dropdown list:

 

screenshot.PNG

 

Feel free to message me if you have any questions or if you have a specific issue with which you're struggling.

 

Highlighted
5 - Atom

Thank you so much for the explanation. I will surely reach out to you guys here in the forum should I need help on anything about Alteryx. Thanks

Highlighted
Alteryx Certified Partner

Also, a great reference for experimentation is regexr.com. You can paste your own test text in the text box and get real-time feedback on how your regular expression evaluates on that text.