Parsing Data from Single Column

Hi alteryx community!

I've been working on a problem and have not been able to come to a solution, I'm hoping someone can assist me. I believe the answer to this problem will be using a regex function to parse the data but I haven't been able to figure out how to create a formula that solves all my criteria. See details of my issue below:

- I used the 'pdf input' tool to convert a ~750 page pdf into an excel file which was used as the starting point in the workflow. After converting the pdf into excel all the columns were combined into one.

- I would like to separate the following: date, names (include first, middle and last), first set of two digits (these don't exist on all rows of data, see row 8 and 9 in the attached excel for an example), next five digits (also don't exist on all rows of data), the text after the previous five digits (should align with the text that has no digits preceding it - e.g. football, grass would be in the same column as left wing player - ice hockey etc.), the amount, and the three capitalized initials. As mentioned above, I believe the way to solve this would be through the use of a regex formula but I haven't been able to create one that can solve each of the above criteria.

- The telephone numbers at the end are irrelevant and can be removed.

Also worth noting: in some instances there is not telephone number at the end of the row of data (i.e. the three initials are the last inputs in the row). There are also a varying number of spaces between certain numbers/words in each of the rows.

Below is a screenshot of how I'm trying to get my data to end up looking:

Appreciate any help, thanks!

Alteryx Sample Data.xlsx

Datasets

Regex

Output

Tips and Tricks

Parse

Preparation

Accepted answers

atcodedog05

Hi @kge5293

It wasnt easy but was able to crack it.

Here is a workflow for the task.

Output:

Workflow:

Hope this helps 🙂 Please check and let me know

If this post helps you please mark it as solution. And give a like if you dont mind 😀👍

parse.yxmd

All comments

atcodedog05

Hi @kge5293

It wasnt easy but was able to crack it.

Here is a workflow for the task.

Output:

Workflow:

Hope this helps 🙂 Please check and let me know

If this post helps you please mark it as solution. And give a like if you dont mind 😀👍

parse.yxmd

atcodedog05

Hi @kge5293

Did you get a chance to check it out. Does it work as accepted.

kge5293

Thanks @atcodeog05 this is extremely helpful and solves 99% of my problem! When I was reviewing the data there were a handful of rows that didn't get parsed in the same format as the ones I initially shared I think because of minor nuances in their structure. Is there something that can be tweaked in the regex to parse these as well?

In the attached file the first row wasn't properly parsed during the first regex (no data was carried forward) and the remaining rows lost data after the second regex (only the amount and three initials were lost).

Alteryx Sample Data2.xlsx

rebeccamourtos

I am a newbie here, I am trying to figure out how to use the PDF Input to convert PDFs to Excel, how do you do that?

Quick Links

This months top contributors

atcodedog05 19458

Qiu 15865

binu_acs 15708

MarqueeCrew 13708

apathetichell 13703