I have been tinkering with a problem.
I have a record with a long string of a phone transcription embedded in it and I want to clean it so it makes more sense to an end user.
Here is a sample of what I have at this point:
1:HELLO 1:MY 1:NAME 1:IS 1:MONA 1:HOW 1:MAY 1:I 1:HELP 1:YOU 2:YES 2:I 2:NEED 2:TO 2:CHECK 2:THE 2:STATUS 2:ON 2:AN 2:ORDER 2:I 2:WAS 2:TOLD 2:THAT 2:IT 2:WAS 2:SUPPOSED 2:TO 2:BE 2:SHIPPED 2:OUT 2:MONDAY 2:AND 2:I 2:WAS 2:SUPPOSED 2:TO 2:RECEIVE 2:IT 2:YESTERDAY 2:I 2:DID 2:NOT 2:RECEIVE 2:IT 1:OKAY 1:ONE 1:MOMENT 1:AND 1:ILL 1:BE 1:GLAD 1:TO 1:CHECK 1:THAT 1:STATUS
I want it to appear like this:
Speaker1: HELLO MY NAME IS MONA HOW MAY I HELP YOU Speaker2: YES I NEED TO CHECK THE STATUS ON AN ORDER I WAS TOLD THAT IT WAS SUPPOSED TO BE SHIPPED OUT MONDAY AND I WAS SUPPOSED TO RECIEVE IT YESTERDAY AND I DID NOT RECIEVE IT Speaker1: OKAY ONE MOMENT AND ILL BE GLAD TO CHECK THAT STATUS
I will work with the output afterwards to make it easier to read, but I really want to group the words by speakers for the string without breaking it into more columns. I need to be able to use this workflow to convert more than one transcript at a time. I am thinking this can be done with regex, but I have not figured out a solution yet.
Any ideas?
Solved! Go to Solution.
Hey Jared,
I would do maybe something like this? It's not very clean but it works.
Use RegEx to Tokenize and split into rows your big long field. use "\w+" to match words (this is where someone could be much more clever)
Then I did a multi-row formula to find which speaker was speaking, then filter out the useless lines.
Leaves a result like this:
Here's one way to do it, without regex but still gets the job done. I've left the output in several rows since it sounds like you'll be parsing it anyway.
Hi @Jared_Durer
A touch of RegEx to parse the data into rows (using the space as delimiter. You can easily do this with Text to Columns too, using \s as the delimiter), then a touch of Formulas to clean the data (remove the number and : with Regex_Replace, and create a speaker column). Assign a row number, effectively which sentence the data belongs to, with Multi Row Formula, and build the sentences back with a Summarize.
Please let me know if this works!
Cheers,
Esther
I used elements from what each of you provided to come up with a solution that works. My main hurdle actually came from other records trying to get mixed up with each other.
Each word has a time associated with it. I used that time and the speaker number to put together phrases and eventually group them all together into a meaningful transcript.
Thank you all so much for your assistance.