Weekly Challenges

JoeM · ‎02-06-2017

Last week's solution can be found HERE.

This week's challenge, by its title, sounds like a simple task. However, a quick look at the data will show that it's not going to be just a summarize tool. If you have been meaning to finally learn the RegEx tool, this exercise will be a great first foray with a relatively simple expression. In this exercise, we will take a set of very ugly data, parse it to find hashtags within a text field, find out how many times the hashtag was written, and what users used the hashtag.

If you would like to learn more about our RegEx Tool - check out our help on the tool. If you would like a simpler RegEx overview, check this interactive learning out.

Good luck!

mceleavey · ‎02-06-2017

I'm not sure the results provided are correct as they show a #followme result against id_str_13, but that text does not contain a hashtag, unless I'm missing something?

That aside, my solution is as follows:

Spoiler

I first created a RecordID field, then used Regex to parse the text field using the following:

#.*?\>

This splits the text by a hashtag and the end of the attached word. I used split to rows to provide a row per hashtag. I then filtered out the null rows, summarised and provided a count of the hashtags per id_str and used crosstab to pivot the data.
I then replaced nulls with zeros using the "Data Cleansing" tool and created the total.

I first created a RecordID field, then used Regex to parse the text field using the following:#.*?\>This splits the text by a hashtag and the end of the attached word. I used split to rows to provide a row per hashtag. I then filtered out the null rows, summarised and provided a count of the hashtags per id_str and used crosstab to pivot the data.I then replaced nulls with zeros using the "Data Cleansing" tool and created the total.

MarqueeCrew · ‎02-06-2017

Of course I wanted to solve this without the use of RegEx (just being difficult). I don't agree that id_str_13 should have any counts.

Spoiler

Screen Shot 2017-02-06 at 12.18.02 PM.png

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.

JoeM · ‎02-06-2017

Good catch @mceleavey and @MarqueeCrew, I have updated the start file to have the correct expected results.

Max06270 · ‎02-06-2017

Nice challenge, have a great week guys!

Spoiler

Harbinger · ‎02-08-2017

Spoiler

I used \#[a-zA-Z\d]+ to grab and tokenize into rows. Biggest difference is I am using the multi-field formula tool to rename and to return binary results for hashtag/string flags. this is in contrast to a formula and dynamic rename. As always, there's more than one way to shine a penny!

I have attached my work for a deep dive-- fun stuff!

jury_maggay · ‎02-13-2017

Hello,

I would like to know how the expression #.*?\> works. I was also trying to solve the challenge using RegEx tool but can't seem to figure out the correct expression to use. Appreciate your feedback on this :-)

mceleavey · ‎02-13-2017

In regex the .*? part simply means "take everything in between". From the dropdown you'll see a list of Regex tools that allow you to then construct things like that seen in the example.

The # here, followed by the .*? simply means take everything from the # symbol to...

I then used the \> which means "the end of the word".

So, in speech terms, it literally translates to "take everything from the # symbol to the end of the word".

Regex really isn't as scary as it first appears. I myself had no Regex experience before using Alteryx but it's really easy once you get your head around it.

Try experimenting with the dropdown list:

Feel free to message me if you have any questions or if you have a specific issue with which you're struggling.

jury_maggay · ‎02-13-2017

Thank you so much for the explanation. I will surely reach out to you guys here in the forum should I need help on anything about Alteryx. Thanks

Harbinger · ‎02-13-2017

Also, a great reference for experimentation is regexr.com. You can paste your own test text in the text box and get real-time feedback on how your regular expression evaluates on that text.

Weekly Challenges

IDEAS WANTED

Challenge #56: Parsing and Counting Hashtags