Want to get involved? We're always looking for ideas and content for Weekly Challenges.
SUBMIT YOUR IDEAsolved
Slightly different output but what I have was reading very well so happy to submit.
Like many others my numbers weren't exact but I was actually pretty happy with my output (validated with some random sampling). I used a lot of regex tools and sampling to keep testing different cases. In hindsight once I realized the text was wrapped in P's I could've likely eliminated everything else around it. By getting rid of empty rows and nulls my row count went down by a lot.
Hello.
The REGEX shown in the proposed solution is instructive, thank you.
However, the Part 1 solution record counts do not tie out.
On the left the record count after the regex and filter for BODY and before the summation is16325.
On the right the input record coming into the file has 16319.
Where did the six records go?
As a number of authors note that they did not "quite" tie out, might the admins review the post?
Thanks.
Done. You can get 16,325 by changing the empty text of the field [body] into something other than empty/null(), then the concatenate will pick it up as its own piece whilst the regex (/n) will pick it up as well.
This ensures 16,325 in, and 16,325 out. I've tweaked the solution given by Alteryx to better fit this criteria - makes more sense now.
Good practice for using the text mining tools