So here is my version, could do a group by i guess on the challenge numbers to see the number of comments, but unable to work out if positive or negative comment as dont have those features but pretty happy with the data cleansing done.
Like many others my numbers weren't exact but I was actually pretty happy with my output (validated with some random sampling). I used a lot of regex tools and sampling to keep testing different cases. In hindsight once I realized the text was wrapped in P's I could've likely eliminated everything else around it. By getting rid of empty rows and nulls my row count went down by a lot.
Done. You can get 16,325 by changing the empty text of the field [body] into something other than empty/null(), then the concatenate will pick it up as its own piece whilst the regex (/n) will pick it up as well.
This ensures 16,325 in, and 16,325 out. I've tweaked the solution given by Alteryx to better fit this criteria - makes more sense now.