Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
Alteryx is here to help you solve your biggest data challenges. Read about the new Virtual Solution Center here.
SOLVED

Stripping RTF tags

Highlighted
7 - Meteor

I have a dataset with a bunch of RTF tags, which I need to strip and keep the plain text. The output will be a CSV, so I don't need to keep any formatting, only the text. 

Here are some examples of the tags and what the expected output after stripping is. 

 

Input with RTFExpected Output
{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 Microsoft Sans Serif;}}
\viewkind4\uc1\pard\f0\fs17 left vm \par
}
left vm
{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 Microsoft Sans Serif;}}
\viewkind4\uc1\pard\f0\fs17 $1894.04 EFT\par
}
$1894.04 EFT
{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 Microsoft Sans Serif;}}
{\colortbl ;\red0\green0\blue0;}
\viewkind4\uc1\pard\cf1\f0\fs16 N/a, didn't leave a vm, will f/up\cf0\fs17\par
}
N/a, didn't leave a vm, will f/up
{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 Microsoft Sans Serif;}{\f1\fnil Microsoft Sans Serif;}}
\viewkind4\uc1\pard\f0\fs17 12/19/14- lef vm regarding saturday meeting again. will follow up in January 2015 if they do not call back.\f1\par
\par
\f0 12/15/14- left vm regarding saturding meeting week of the 22nd, will follow up Friday if I do not hear back.\par
\par
\par
12/4- left vm. MD\f1\par
\par
\f0 11/26- left vm. MD\f1\par
\par
\f0 11/4- left vm. MD\f1\par
\par
\f0 thx for attending seminar. confirming Nov 1st sat, 1 or 3pm??\par
}
12/19/14- lef vm regarding saturday meeting again. will follow up in January 2015 if they do not call back.12/15/14- left vm regarding saturding meeting week of the 22nd, will follow up Friday if I do not hear back.12/4- left vm.11/26- left vm.11/4- left vm.thx for attending seminar. confirming Nov 1st sat, 1 or 3pm??

 

Is this something that can be done in Alteryx? Possibly with a Regex? 

Highlighted
Alteryx Certified Partner

Something like this should work:

 

 

 

\\f0\\fs\d\d(.*?)\\

 

 

 

EDITED - fixed

EDITED 2 - fixed better

Highlighted
Alteryx
Alteryx

Hi 

 

I knocked up the attached workflow.

 

Does this more-or-less do what you wanted to achieve in the output?

Highlighted
7 - Meteor

@jamielaird Thank you for the reply. I tried your regex but it didn't seem to work. On the left is the input. 

alteryx3.PNG

 

 

Highlighted
7 - Meteor

@RishiK Thank you for the workflow. It's somewhat working (understandably removing special characters in the actual message, I didn't expect otherwise), but is there a way to do it without splitting the text into a bunch of lines? 

Highlighted
Alteryx
Alteryx

If you go down the regex route then more often than not, it will be split into separate lines whilst you're cleansing it.  Is there a reason why that would be an issue?

Highlighted
7 - Meteor

It was more trying to avoid the hassle of piecing them back together, but ultimately I decided to go a different route. The regex method more or less worked but it didn't preserve the integrity of the data, which ended up being more important than I thought.

 

In case this helps anyone in the future, there is a RichTextBox in C# which is able to parse RTF tags and convert RTF to plaintext. My solution was to export the data as a CSV, read in the CSV file in C#, strip the tags, and output the clean text as a CSV which is then read in by Alteryx. As far as I know, Alteryx doesn't have C# integration (only Python and R), so this process isn't 100% automated but it's better than nothing. 

 

I'll accept your answer as solution since it was the best answer I received, thank you. 

Highlighted
5 - Atom

Just wanted to say this worked perfectly for me! Thank you so much. Would it be too much trouble for you  to explain how you constructed your RegEx expression?

Labels