Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.
Community is experiencing an influx of spam. As we work toward a solution, please use the 'Notify Moderator' option on the ellipsis menu to flag inappropriate posts.
Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Reg Ex Parsing Help

jennahecht
6 - Meteoroid

Hi, I am hoping to parse the following data that I receive. I am familiar with the text to columns tool, and semi familiar with the reg ex tool, but unsure how to parse this data exactly how I would like to: 

 

Jenna Applepie/Disco Ball: White Noise 41.0k -3% [AUDIO STREAMS 11.6m -4%, ALBUM 1.9k -13%](c)

 

In which I am hoping for to be broken out in 8  columns: 

 

Jenna Applepie

Disco Ball: White Noise

41.0k - 3%

AUDIO STREAMS

11.6m -4%

ALBUM

1.9K - 13%

 

I am able to break the columns using specific delimters like the "/" and "[", but seperating  tAny assistance is much appreciated!

 

Thank you,

Jenna 

 

 

9 REPLIES 9
Jonathan-Sherman
15 - Aurora
15 - Aurora

Hi @jennahecht,

 

It's a little hard to ensure we're capturing all the idiosyncrasies with one line but this might work for you, you'll want to use the Regex tool in parse mode.

 

(.*?)\/(.*?)\s(\d.*?%).*?\[(.*?)\s(\d.*?%).*?(\w+).*?(\d.*?%).*?\((.*?)\)

 

If this solves your issue please mark the answer as correct, if not let me know!

 

Kind regards,

Jonathan

danilang
19 - Altair
19 - Altair

Hi @jennahecht 

 

Can you provide more than 1 line of input?  As many as you can would be best?  Having more data to examine will help us find any patterns that are common across the rows

 

Dan

jennahecht
6 - Meteoroid

Sure - here are 13 lines of data - the information is sensitive so ignore my lack of creativity with these lines 😄 

 

1) Jenna Gabrielle/Good Times: The Fun Album  11.0k -6% [AUDIO STREAMS 11.6m -1%, ALBUM 45k -99%](s)
2) Robert Dean/Vail Colorado Skiing  99.9k -1% [AUDIO STREAMS 40.2m -1%, ALBUM 1.0k +9%](u)
3) Blake Larry/Future Crying  16.5k +11% [AUDIO STREAMS 18.9m +16%, ALBUM 1.1k +41%](u)
4) Discoball/After Clock  11.1k +4% [AUDIO STREAMS 44.6m -1%, ALBUM 4.2k +46%](u)
5) Winne the Pooh/Love Party  90.4k -6% [AUDIO STREAMS 98.4m -2%, ALBUM 0.1k -19%](w)
6) Carol Baskin/When It Rains… It Pours  90.9k debut [AUDIO STREAMS 92.0m debut, ALBUM 1.9k debut] D1C
7) Big Mama/Your Turn  19.4k +1% [AUDIO STREAMS 96.8m +1%, ALBUM 0.4k +16%](u)
😎 Papa Bear/Dog Days  19.1k -2% [AUDIO STREAMS 92.4m -6%, ALBUM 0.1k -1%](u)
9) J Money/Workflow Party  12.5k +52% [AUDIO STREAMS 11.2m +90%, ALBUM 10.1k +115%](u)
10) Ignore Names/You See Me Now 16.8k -4% [AUDIO STREAMS 90.4m -4%, ALBUM 1.4k +9%]
11) Luke Neck/The Sun and the Moon . . .  16.2k debut [AUDIO STREAMS 1.6m debut, ALBUM 14.9k debut](w) D1C
12) Brothers/Night Time  16.4k debut [AUDIO STREAMS 11.1m debut, ALBUM 16.0k debut](u) D1C
13) Mom and Dad/Money Machine  16.0k -9% [AUDIO STREAMS 96.9m -9%, ALBUM 1.1k -10%](u)
jennahecht
6 - Meteoroid

This did not work for me 

apathetichell
19 - Altair

I think this is best done in a combo of regex and text to columns. There are lot of specific issues which pop up and I'm not sure if they are characteristic of the datasource or one off things. I hardcoded around a few of them and my regex is by no means perfect. To do a proper regex parse on this with all of the exceptions (different spaces and usage of regex special characters in the text) would be exceptionally long I think. I regexed out the artist/album title and then combined them back with the remainder which I text to columned out via splitting on space. Also - If "Audio streams" and "album" are static in every entry then you don't really need to parse them since you can just throw them back in as dummy column later. No idea how the parsing would work with album titles or artist names with splits in them...

 

Then I tried to clean it up as best as I could.

 

Hope this helps or at least points you in the right direction...

echuong1
Alteryx Alumni (Retired)

Try this:

 

(\u.+)\/(\u.+)\s(\d.+)\s\[(\u.+)\s(\d.+)\,\s(\u.+)\s(\d.+)\](.*)

 

echuong1_0-1616457755313.png

 

estherb47
15 - Aurora
15 - Aurora

Hi @jennahecht 

The challenge with this data is some have parenthesis at the end and some don't

So I did it in two steps. 

RegEx tool with the Parse method - .*\)\s+?(.*)\/(.*?)\s+?(\d.*)\s+?\[(.*)\s+?(\d.*),\s(.*?)\s+(\d.*?)\](.*$)

Next a formula tool on the final column. IIF(Contains([RegExOut8], "("),REGEX_Replace([RegExOut8], "\((.*)\)", "$1"),"")

Let me know if that works.

 

Cheers!

Esther

jennahecht
6 - Meteoroid

This works! Thank you so much 

jennahecht
6 - Meteoroid

& this works as well! thank you

Labels
Top Solution Authors