Hi All, I'm attempting to use RegEx to parse a comment field into multiple blocks, then find the commenter, time, and comment. I'm having trouble with the RegEx to break out the comment blocks at the moment. Does anyone know how to best solve this problem or have some links for leads? Details on the problem below! Input: IDComment1(abcd0ef.abcd0ef - 1/1/2019 2:00 PM) Random Free Text (xyz0ef.abc0ef - 1/2/2019 1:00 AM) Response2(a1cd0ef.abcd0ef - 12/1/2018 2:00 PM) Random Free Text (l0n0op.ghi0yf - 12/31/2018 1:00 AM) Other Random Free Text (system administrator - 1/3/2019 6:00 PM) More text3(Donny Darko - d10d00.hai0sd - 12/15/2018 8:00 AM) Other text (System Administrator - 12/15/2018 8:31 AM) Notes notes notes Desired Output: IDCommenterDateTimeComment1abcd0ef.abcd0ef1/1/2019 2:00 PMRandom Free Text1xyz0ef.abc0ef1/2/2019 1:00 AMResponse2a1cd0ef.abcd0ef12/1/2018 2:00 PMRandom Free Text2l0n0op.ghi0yf12/31/2018 1:00 AMOther Random Free Text2l0n0op.ghi0yf1/3/2019 6:00 PMMore text3d10d00.hai0sd12/15/2018 8:00 AMOther text3System Administrator12/15/2018 8:31 AMNotes notes notes Here's what I am thinking for process: 1. Tokenize comment blocks -- e.g. "(abcd0ef.abcd0ef - 1/1/2019 2:00 PM) Random Free Text" - This partially works if I use parse instead of tokenize, but I can't get the regex to capture the blocks - Currently using: ($.*$) 2. Parse commenter, datetime, comment Here are links to other posts that have been helpful so far: Regex Parsing Help Needed With Special Characters RegEx Get Repeated Parse Matches Extract Date from Text Regex Perl Syntax Guide & Regex Cheat Sheet Challenge 11 - Identify Logical Groups Attached is a file with my current work and a sample solution from another post on the Community. RegEx to Parse Comment Field.yxmd

RegEx: Parse Comment Field with Special Characters

Hi All,

I'm attempting to use RegEx to parse a comment field into multiple blocks, then find the commenter, time, and comment. I'm having trouble with the RegEx to break out the comment blocks at the moment. Does anyone know how to best solve this problem or have some links for leads?

Details on the problem below!

Input:

ID	Comment
1	(abcd0ef.abcd0ef - 1/1/2019 2:00 PM) Random Free Text (xyz0ef.abc0ef - 1/2/2019 1:00 AM) Response
2	(a1cd0ef.abcd0ef - 12/1/2018 2:00 PM) Random Free Text (l0n0op.ghi0yf - 12/31/2018 1:00 AM) Other Random Free Text (system administrator - 1/3/2019 6:00 PM) More text
3	(Donny Darko - d10d00.hai0sd - 12/15/2018 8:00 AM) Other text (System Administrator - 12/15/2018 8:31 AM) Notes notes notes

Desired Output:

ID	Commenter	DateTime	Comment
1	abcd0ef.abcd0ef	1/1/2019 2:00 PM	Random Free Text
1	xyz0ef.abc0ef	1/2/2019 1:00 AM	Response
2	a1cd0ef.abcd0ef	12/1/2018 2:00 PM	Random Free Text
2	l0n0op.ghi0yf	12/31/2018 1:00 AM	Other Random Free Text
2	l0n0op.ghi0yf	1/3/2019 6:00 PM	More text
3	d10d00.hai0sd	12/15/2018 8:00 AM	Other text
3	System Administrator	12/15/2018 8:31 AM	Notes notes notes

Here's what I am thinking for process:

1. Tokenize comment blocks -- e.g. "(abcd0ef.abcd0ef - 1/1/2019 2:00 PM) Random Free Text"

- This partially works if I use parse instead of tokenize, but I can't get the regex to capture the blocks

- Currently using: ($.*$)

2. Parse commenter, datetime, comment

Here are links to other posts that have been helpful so far:

Regex Parsing Help Needed With Special Characters

RegEx Get Repeated Parse Matches

Extract Date from Text

Regex Perl Syntax Guide & Regex Cheat Sheet

Challenge 11 - Identify Logical Groups

Attached is a file with my current work and a sample solution from another post on the Community.

RegEx to Parse Comment Field.yxmd

Regex

Parse

Accepted answers

jdunkerley79

I suggest using:

(\(.*?\)[^\(]+)

to tokenise the input comment. Please note this will get confused if message has brackets.

To parse into columns I suggest:

^\((.*?)\s+-\s+([0-9:/APM ]+)\)\s+(.*?)\s*$

This will break into three columns. It is a little defensive against odd characters.

Sample attached.

Regex Split Parse.yxmd

All comments

ShawnM

Please see the attached workflow. It accomplishes your goal.

ShawnM

Please see the attached workflow. It accomplishes your goal.

Parsing Commenters.yxmd

jdunkerley79

I suggest using:

(\(.*?\)[^\(]+)

to tokenise the input comment. Please note this will get confused if message has brackets.

To parse into columns I suggest:

^\((.*?)\s+-\s+([0-9:/APM ]+)\)\s+(.*?)\s*$

This will break into three columns. It is a little defensive against odd characters.

Sample attached.

Regex Split Parse.yxmd

estherb47

Hi @katherineL

Looking at this, part of the challenge is that what delimits the comment blocks for the row parsing also starts the rows. With the tokenize method, you need to specify what defines the delimiter by basically telling Alteryx what not to capture, using the (?:). You put in the delimiting character/string after the :. If there are more than 1, use the | for "or"

So, for the first RegEx parse tool, I'm using the tokenize method, split to rows, with the following as the Regular Expression: (.*?)(?:$|$). This tells the tool to grab everything that's not either an end of the field, or an opening parenthesis. And the opening parenthesis or end of the field are the characters that delimit the rows.

The second RegEx parse uses the Parse method, with the Regular expression of (.+)\s-\s(.+?M)$\s(.+$)

Finally, a formula tool replaces "Donnie Darko" with the code representing his name. I'm sure there's a way to do that in the second RegEx parse tool, just didn't figure it out.

RegEx to Parse Comment Field_EHB.yxmd

estherb47

And here's a great resource on the RegEx parse tool: https://community.alteryx.com/t5/Alteryx-Knowledge-Base/Tool-Mastery-RegEx/ta-p/37689

katherineL

Lots of great replies and solutions! This was the simplest and most direct solution. Thanks @jdunkerley79 and all!

Quick Links

This months top contributors

atcodedog05 19458

Qiu 15866

binu_acs 15708

MarqueeCrew 13708

apathetichell 13703