Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Stripping RTF tags

dcoelho-nhld
7 - Meteor

I have a dataset with a bunch of RTF tags, which I need to strip and keep the plain text. The output will be a CSV, so I don't need to keep any formatting, only the text. 

Here are some examples of the tags and what the expected output after stripping is. 

 

Input with RTFExpected Output
{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 Microsoft Sans Serif;}}
\viewkind4\uc1\pard\f0\fs17 left vm \par
}
left vm
{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 Microsoft Sans Serif;}}
\viewkind4\uc1\pard\f0\fs17 $1894.04 EFT\par
}
$1894.04 EFT
{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 Microsoft Sans Serif;}}
{\colortbl ;\red0\green0\blue0;}
\viewkind4\uc1\pard\cf1\f0\fs16 N/a, didn't leave a vm, will f/up\cf0\fs17\par
}
N/a, didn't leave a vm, will f/up
{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 Microsoft Sans Serif;}{\f1\fnil Microsoft Sans Serif;}}
\viewkind4\uc1\pard\f0\fs17 12/19/14- lef vm regarding saturday meeting again. will follow up in January 2015 if they do not call back.\f1\par
\par
\f0 12/15/14- left vm regarding saturding meeting week of the 22nd, will follow up Friday if I do not hear back.\par
\par
\par
12/4- left vm. MD\f1\par
\par
\f0 11/26- left vm. MD\f1\par
\par
\f0 11/4- left vm. MD\f1\par
\par
\f0 thx for attending seminar. confirming Nov 1st sat, 1 or 3pm??\par
}
12/19/14- lef vm regarding saturday meeting again. will follow up in January 2015 if they do not call back.12/15/14- left vm regarding saturding meeting week of the 22nd, will follow up Friday if I do not hear back.12/4- left vm.11/26- left vm.11/4- left vm.thx for attending seminar. confirming Nov 1st sat, 1 or 3pm??

 

Is this something that can be done in Alteryx? Possibly with a Regex? 

9 REPLIES 9
jamielaird
14 - Magnetar

Something like this should work:

 

 

 

\\f0\\fs\d\d(.*?)\\

 

 

 

EDITED - fixed

EDITED 2 - fixed better 😉

RishiK
Alteryx
Alteryx

Hi 

 

I knocked up the attached workflow.

 

Does this more-or-less do what you wanted to achieve in the output?

dcoelho-nhld
7 - Meteor

@jamielaird Thank you for the reply. I tried your regex but it didn't seem to work. On the left is the input. 

alteryx3.PNG

 

 

dcoelho-nhld
7 - Meteor

@RishiK Thank you for the workflow. It's somewhat working (understandably removing special characters in the actual message, I didn't expect otherwise), but is there a way to do it without splitting the text into a bunch of lines? 

RishiK
Alteryx
Alteryx

If you go down the regex route then more often than not, it will be split into separate lines whilst you're cleansing it.  Is there a reason why that would be an issue?

dcoelho-nhld
7 - Meteor

It was more trying to avoid the hassle of piecing them back together, but ultimately I decided to go a different route. The regex method more or less worked but it didn't preserve the integrity of the data, which ended up being more important than I thought.

 

In case this helps anyone in the future, there is a RichTextBox in C# which is able to parse RTF tags and convert RTF to plaintext. My solution was to export the data as a CSV, read in the CSV file in C#, strip the tags, and output the clean text as a CSV which is then read in by Alteryx. As far as I know, Alteryx doesn't have C# integration (only Python and R), so this process isn't 100% automated but it's better than nothing. 

 

I'll accept your answer as solution since it was the best answer I received, thank you. 

BJACKSON
5 - Atom

Just wanted to say this worked perfectly for me! Thank you so much. Would it be too much trouble for you  to explain how you constructed your RegEx expression?

danejensen4
5 - Atom

Ok, this one hasn't a had a lot of activity recently, but just in case others come here like me looking for an answer, I stumbled on something that worked for me which is very easy.  At some point, Excel introduced a feature that if you have rich text tags in the cell, and you just go into the cell (using F2 or just clicking into it), then hit enter, it will convert it to text.  Not sure this will work for everyone, but you could have a step to write a file, then perform the above step, then another workflow after using the new text field.

 

If you don't want to hit F2 on each cell, you can use this VBA macro:

Sub EditModeForRange()
'
' EditModeForRange Macro
Dim rng As Range
Dim cel As Range

Set rng = Application.Selection

For Each cel In rng.Cells
With cel
Application.SendKeys "{F2}{ENTER}"
End With
Next cel

 

End Sub

JohnMaty
9 - Comet

I had a similar issue and this worked great!  I really need to work on my RegEx!

Labels