This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
I have been tinkering with a problem.
I have a record with a long string of a phone transcription embedded in it and I want to clean it so it makes more sense to an end user.
Here is a sample of what I have at this point:
1:HELLO 1:MY 1:NAME 1:IS 1:MONA 1:HOW 1:MAY 1:I 1:HELP 1:YOU 2:YES 2:I 2:NEED 2:TO 2:CHECK 2:THE 2:STATUS 2:ON 2:AN 2:ORDER 2:I 2:WAS 2:TOLD 2:THAT 2:IT 2:WAS 2:SUPPOSED 2:TO 2:BE 2:SHIPPED 2:OUT 2:MONDAY 2:AND 2:I 2:WAS 2:SUPPOSED 2:TO 2:RECEIVE 2:IT 2:YESTERDAY 2:I 2:DID 2:NOT 2:RECEIVE 2:IT 1:OKAY 1:ONE 1:MOMENT 1:AND 1:ILL 1:BE 1:GLAD 1:TO 1:CHECK 1:THAT 1:STATUS
I want it to appear like this:
Speaker1: HELLO MY NAME IS MONA HOW MAY I HELP YOU Speaker2: YES I NEED TO CHECK THE STATUS ON AN ORDER I WAS TOLD THAT IT WAS SUPPOSED TO BE SHIPPED OUT MONDAY AND I WAS SUPPOSED TO RECIEVE IT YESTERDAY AND I DID NOT RECIEVE IT Speaker1: OKAY ONE MOMENT AND ILL BE GLAD TO CHECK THAT STATUS
I will work with the output afterwards to make it easier to read, but I really want to group the words by speakers for the string without breaking it into more columns. I need to be able to use this workflow to convert more than one transcript at a time. I am thinking this can be done with regex, but I have not figured out a solution yet.
Any ideas?
Solved! Go to Solution.
Hey Jared,
I would do maybe something like this? It's not very clean but it works.
Use RegEx to Tokenize and split into rows your big long field. use "\w+" to match words (this is where someone could be much more clever)
Then I did a multi-row formula to find which speaker was speaking, then filter out the useless lines.
Leaves a result like this:
Here's one way to do it, without regex but still gets the job done. I've left the output in several rows since it sounds like you'll be parsing it anyway.
Hi @Jared_Durer
A touch of RegEx to parse the data into rows (using the space as delimiter. You can easily do this with Text to Columns too, using \s as the delimiter), then a touch of Formulas to clean the data (remove the number and : with Regex_Replace, and create a speaker column). Assign a row number, effectively which sentence the data belongs to, with Multi Row Formula, and build the sentences back with a Summarize.
Please let me know if this works!
Cheers,
Esther
I used elements from what each of you provided to come up with a solution that works. My main hurdle actually came from other records trying to get mixed up with each other.
Each word has a time associated with it. I used that time and the speaker number to put together phrases and eventually group them all together into a meaningful transcript.
Thank you all so much for your assistance.