Hi, I am hoping to parse the following data that I receive. I am familiar with the text to columns tool, and semi familiar with the reg ex tool, but unsure how to parse this data exactly how I would like to:
Jenna Applepie/Disco Ball: White Noise 41.0k -3% [AUDIO STREAMS 11.6m -4%, ALBUM 1.9k -13%](c)
In which I am hoping for to be broken out in 8 columns:
Jenna Applepie
Disco Ball: White Noise
41.0k - 3%
AUDIO STREAMS
11.6m -4%
ALBUM
1.9K - 13%
c
I am able to break the columns using specific delimters like the "/" and "[", but seperating tAny assistance is much appreciated!
Thank you,
Jenna
Solved! Go to Solution.
Hi @jennahecht,
It's a little hard to ensure we're capturing all the idiosyncrasies with one line but this might work for you, you'll want to use the Regex tool in parse mode.
(.*?)\/(.*?)\s(\d.*?%).*?\[(.*?)\s(\d.*?%).*?(\w+).*?(\d.*?%).*?\((.*?)\)
If this solves your issue please mark the answer as correct, if not let me know!
Kind regards,
Jonathan
Hi @jennahecht
Can you provide more than 1 line of input? As many as you can would be best? Having more data to examine will help us find any patterns that are common across the rows
Dan
Sure - here are 13 lines of data - the information is sensitive so ignore my lack of creativity with these lines 😄
1) Jenna Gabrielle/Good Times: The Fun Album 11.0k -6% [AUDIO STREAMS 11.6m -1%, ALBUM 45k -99%](s) |
2) Robert Dean/Vail Colorado Skiing 99.9k -1% [AUDIO STREAMS 40.2m -1%, ALBUM 1.0k +9%](u) |
3) Blake Larry/Future Crying 16.5k +11% [AUDIO STREAMS 18.9m +16%, ALBUM 1.1k +41%](u) |
4) Discoball/After Clock 11.1k +4% [AUDIO STREAMS 44.6m -1%, ALBUM 4.2k +46%](u) |
5) Winne the Pooh/Love Party 90.4k -6% [AUDIO STREAMS 98.4m -2%, ALBUM 0.1k -19%](w) |
6) Carol Baskin/When It Rains… It Pours 90.9k debut [AUDIO STREAMS 92.0m debut, ALBUM 1.9k debut] D1C |
7) Big Mama/Your Turn 19.4k +1% [AUDIO STREAMS 96.8m +1%, ALBUM 0.4k +16%](u) |
😎 Papa Bear/Dog Days 19.1k -2% [AUDIO STREAMS 92.4m -6%, ALBUM 0.1k -1%](u) |
9) J Money/Workflow Party 12.5k +52% [AUDIO STREAMS 11.2m +90%, ALBUM 10.1k +115%](u) |
10) Ignore Names/You See Me Now 16.8k -4% [AUDIO STREAMS 90.4m -4%, ALBUM 1.4k +9%] |
11) Luke Neck/The Sun and the Moon . . . 16.2k debut [AUDIO STREAMS 1.6m debut, ALBUM 14.9k debut](w) D1C |
12) Brothers/Night Time 16.4k debut [AUDIO STREAMS 11.1m debut, ALBUM 16.0k debut](u) D1C |
13) Mom and Dad/Money Machine 16.0k -9% [AUDIO STREAMS 96.9m -9%, ALBUM 1.1k -10%](u) |
This did not work for me
I think this is best done in a combo of regex and text to columns. There are lot of specific issues which pop up and I'm not sure if they are characteristic of the datasource or one off things. I hardcoded around a few of them and my regex is by no means perfect. To do a proper regex parse on this with all of the exceptions (different spaces and usage of regex special characters in the text) would be exceptionally long I think. I regexed out the artist/album title and then combined them back with the remainder which I text to columned out via splitting on space. Also - If "Audio streams" and "album" are static in every entry then you don't really need to parse them since you can just throw them back in as dummy column later. No idea how the parsing would work with album titles or artist names with splits in them...
Then I tried to clean it up as best as I could.
Hope this helps or at least points you in the right direction...
Hi @jennahecht
The challenge with this data is some have parenthesis at the end and some don't
So I did it in two steps.
RegEx tool with the Parse method - .*\)\s+?(.*)\/(.*?)\s+?(\d.*)\s+?\[(.*)\s+?(\d.*),\s(.*?)\s+(\d.*?)\](.*$)
Next a formula tool on the final column. IIF(Contains([RegExOut8], "("),REGEX_Replace([RegExOut8], "\((.*)\)", "$1"),"")
Let me know if that works.
Cheers!
Esther
This works! Thank you so much
& this works as well! thank you