community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
SOLVED

Cleaning a string using regex

Meteor

I have been tinkering with a problem.

 

I have a record with a long string of a phone transcription embedded in it and I want to clean it so it makes more sense to an end user.

 

Here is a sample of what I have at this point:

1:HELLO 1:MY 1:NAME 1:IS 1:MONA 1:HOW 1:MAY 1:I 1:HELP 1:YOU 2:YES 2:I 2:NEED 2:TO 2:CHECK 2:THE 2:STATUS 2:ON 2:AN 2:ORDER 2:I 2:WAS 2:TOLD 2:THAT 2:IT 2:WAS 2:SUPPOSED 2:TO 2:BE 2:SHIPPED 2:OUT 2:MONDAY 2:AND 2:I 2:WAS 2:SUPPOSED 2:TO 2:RECEIVE 2:IT 2:YESTERDAY 2:I 2:DID 2:NOT 2:RECEIVE 2:IT 1:OKAY 1:ONE 1:MOMENT  1:AND 1:ILL 1:BE 1:GLAD 1:TO 1:CHECK 1:THAT 1:STATUS


 I want it to appear like this:

Speaker1: HELLO MY NAME IS MONA HOW MAY I HELP YOU Speaker2: YES I NEED TO CHECK THE STATUS ON AN ORDER I WAS TOLD THAT IT WAS SUPPOSED TO BE SHIPPED OUT MONDAY AND I WAS SUPPOSED TO RECIEVE IT YESTERDAY AND I DID NOT RECIEVE IT Speaker1: OKAY ONE MOMENT AND ILL BE GLAD TO CHECK THAT STATUS

 

I will work with the output afterwards to make it easier to read, but I really want to group the words by speakers for the string without breaking it into more columns. I need to be able to use this workflow to convert more than one transcript at a time. I am thinking this can be done with regex, but I have not figured out a solution yet.

 

Any ideas?

Alteryx Certified Partner

Hey Jared,


I would do maybe something like this? It's not very clean but it works.
Use RegEx to Tokenize and split into rows your big long field. use "\w+" to match words (this is where someone could be much more clever)

Then I did a multi-row formula to find which speaker was speaking, then filter out the useless lines.

Leaves a result like this:

Alteryx Partner

@Jared_Durer 


Here's one way to do it, without regex but still gets the job done.  I've left the output in several rows since it sounds like you'll be parsing it anyway.

 

Transcription.png

 

Transcription2.png

Pulsar

Hi @Jared_Durer 

A touch of RegEx to parse the data into rows (using the space as delimiter. You can easily do this with Text to Columns too, using \s as the delimiter), then a touch of Formulas to clean the data (remove the number and : with Regex_Replace, and create a speaker column). Assign a row number, effectively which sentence the data belongs to, with Multi Row Formula, and build the sentences back with a Summarize.

 

image.png


Please let me know if this works!

 

Cheers,

Esther

Highlighted
Meteor

I used elements from what each of you provided to come up with a solution that works. My main hurdle actually came from other records trying to get mixed up with each other. 

 

Each word has a time associated with it. I used that time and the speaker number to put together phrases and eventually group them all together into a meaningful transcript. 

Capture.PNG

Thank you all so much for your assistance. 

Labels