Alteryx Designer Desktop Discussions

johnnyt · ‎03-15-2020

Hi All,

Fairly new to Alteryx and need some help parsing a .txt file to separate what each speaker says into a new .txt file.

For example, I am looking to separate everything the CEO says into its own .txt file. It can be one continuous string if necessary. All of the files share the following format:

The list of speakers is preceded by the text of the actual call conversation.
The position of speakers are formatted as "Name - Position"
The first speaker is always the operator
When a new speaker speaks, the format is their full name (without position) and new line with they say.
- Ex. Steven Humphreys
  - "Ipsum lipsum"

The workflow needs to be flexible enough to work off of the position of the speaker and not their name.

I am not sure if there is a way to have Alteryx store the name of the speakers in between the "Executives" cell and the "Operator" cell. Then have Alteryx check against the list for a line that contains only the speaker name. For example, the logic would look something like: Check for Steven Humphrey - > If found, then store all lines of text following Steven Humphrey's name until another speaker is found. - > If another speaker is found, then stop at [row-1] -> continue until Steven Humphreys is found again.

Executives

Steven Humphreys - CEO

Sandra Wallach - CFO

Analysts

Mike Latimore - Northland Capital Markets

Operator

I've attached a workflow that was created using the sample.txt file but it isn't flexible enough to work with the input_sample files. There are a couple of thousand text files I need to parse and using Alteryx would make my life so much easier. I appreciate all the help! 🙂

DavidP · ‎03-16-2020

This workflow sort of does what you want but there are some issues. I used the data from input sample 2.

It extracts the list of speakers/panelists from the first number of rows and strips the titles. It then left joins those names back in to the original data set and uses a multi-row formula to fill in the gaps.

Here are the issues - you can investigate them by looking at the Browse tool.

1. It does not have the names of people asking questions, so can't match them

2. If there is even a slight spelling difference from the names at the top, they are not picked up, as you can see.

The workflow does get most of it right and writes the output to individual text files.

I'm not a fuzzy matcher (more of a black and white kind of guy), but perhaps you can play around with fuzzy matching to see if you can overcome issue 2.

johnnyt · ‎03-21-2020

Hi David,

Thanks for the solution! I am actually having issues running the workflow with another similar file.

I edited the workflow a little to try to get it to match since I received some errors. The output file is just Operator and it aggregates all the text but does divide the text by speaker in the output file.

Can I get your thoughts on what is going wrong?

DavidP · ‎03-22-2020

Here's an updated version with your new file. I modified it to load the file with an Input Data tool and also changed the output format to csv with delimiter set to \0, which works better than the flat ascii choice. I also added a formula tool that adds the file path and txt extension and modified the output data tool to change entire path.

The path I chose is just the current path that the workflow file is saved in.

Let me know if you have any further issues.

johnnyt · ‎09-13-2020

Sorry for the late acceptance but this was a great framework for me to work off of! Much appreciated.

Alteryx Designer Desktop Discussions

Parsing a .txt Transcript to Separate Speakers Within the Text