This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
I'm attempting to use RegEx to parse a comment field into multiple blocks, then find the commenter, time, and comment. I'm having trouble with the RegEx to break out the comment blocks at the moment. Does anyone know how to best solve this problem or have some links for leads?
Details on the problem below!
(abcd0ef.abcd0ef - 1/1/2019 2:00 PM) Random Free Text (xyz0ef.abc0ef - 1/2/2019 1:00 AM) Response
(a1cd0ef.abcd0ef - 12/1/2018 2:00 PM) Random Free Text (l0n0op.ghi0yf - 12/31/2018 1:00 AM) Other Random Free Text (system administrator - 1/3/2019 6:00 PM) More text
(Donny Darko - d10d00.hai0sd - 12/15/2018 8:00 AM) Other text (System Administrator - 12/15/2018 8:31 AM) Notes notes notes
1/1/2019 2:00 PM
Random Free Text
1/2/2019 1:00 AM
12/1/2018 2:00 PM
Random Free Text
12/31/2018 1:00 AM
Other Random Free Text
1/3/2019 6:00 PM
12/15/2018 8:00 AM
12/15/2018 8:31 AM
Notes notes notes
Here's what I am thinking for process:
1. Tokenize comment blocks -- e.g. "(abcd0ef.abcd0ef - 1/1/2019 2:00 PM) Random Free Text"
- This partially works if I use parse instead of tokenize, but I can't get the regex to capture the blocks
- Currently using: (\(.*\))
2. Parse commenter, datetime, comment
Here are links to other posts that have been helpful so far:
Looking at this, part of the challenge is that what delimits the comment blocks for the row parsing also starts the rows. With the tokenize method, you need to specify what defines the delimiter by basically telling Alteryx what not to capture, using the (?:). You put in the delimiting character/string after the :. If there are more than 1, use the | for "or"
So, for the first RegEx parse tool, I'm using the tokenize method, split to rows, with the following as the Regular Expression: (.*?)(?:\(|$). This tells the tool to grab everything that's not either an end of the field, or an opening parenthesis. And the opening parenthesis or end of the field are the characters that delimit the rows.
The second RegEx parse uses the Parse method, with the Regular expression of (.+)\s-\s(.+?M)\)\s(.+$)
Finally, a formula tool replaces "Donnie Darko" with the code representing his name. I'm sure there's a way to do that in the second RegEx parse tool, just didn't figure it out.