Good morning all. I am working through a challenge and am struggling to parse the final section.
Here is an example row:
0000001Wii Sports Wii (2006) CAT:Sports PUB:Nintendo $41.49million, $29.02million, $3.77million, $8.46million, $82.74million
And here is my regex so far:
^(\d{7})(.*) (\<\w+\>) \((\d{4})\) CAT:(.*) PUB:(.*) \$?(.*)
What I'm struggling with is separating the sales from the publisher. Since there can be publishers with multiple names I wanted to focus on parsing using the first appearance of the dollar sign. I thought my lazy quantifier would achieve this but I can never get it to work. It instead takes the last appearance (82.74 million).
I could approach the problem in a different way but long term I really want to figure out the lazy/greedy quantifier and why I can never get it to work. Thanks community for any insights and have a beautiful Thursday
Solved! Go to Solution.
Hey @tristank , you need a '?' in the brackets before the '\$', i.e.:
^(\d{7})(.*) (\<\w+\>) \((\d{4})\) CAT:(.*) PUB:(.*?) \$?(.*)
Before, the PUB:(.*) was acting greedy, taking as much as possible and therefore going up to the last $. Now, it acts lazy, taking as little as possible and just going to the first $. Hope this helps!
Try this.
(\d+)(\w.*?)\s+(\w+)\s+\((\d+)\).*?CAT:(\w+).*?PUB:(\w+(?:\s+\w+)*)\s+(\$[\d.]+million)(?:,\s*)?(\$[\d.]+million)?(?:,\s*)?(\$[\d.]+million)?(?:,\s*)?(\$[\d.]+million)?(?:,\s*)?(\$[\d.]+million)?
Thanks @FinnCharlton that makes a lot of sense I guess I was thinking it would just go to the first '$'. Will hit you up on convo next time I have a regex problem ;)
And thanks @cjaneczko I learned a lot reading through your solution!