Hi alteryx community!
I've been working on a problem and have not been able to come to a solution, I'm hoping someone can assist me. I believe the answer to this problem will be using a regex function to parse the data but I haven't been able to figure out how to create a formula that solves all my criteria. See details of my issue below:
- I used the 'pdf input' tool to convert a ~750 page pdf into an excel file which was used as the starting point in the workflow. After converting the pdf into excel all the columns were combined into one.
- I would like to separate the following: date, names (include first, middle and last), first set of two digits (these don't exist on all rows of data, see row 8 and 9 in the attached excel for an example), next five digits (also don't exist on all rows of data), the text after the previous five digits (should align with the text that has no digits preceding it - e.g. football, grass would be in the same column as left wing player - ice hockey etc.), the amount, and the three capitalized initials. As mentioned above, I believe the way to solve this would be through the use of a regex formula but I haven't been able to create one that can solve each of the above criteria.
- The telephone numbers at the end are irrelevant and can be removed.
Also worth noting: in some instances there is not telephone number at the end of the row of data (i.e. the three initials are the last inputs in the row). There are also a varying number of spaces between certain numbers/words in each of the rows.
Below is a screenshot of how I'm trying to get my data to end up looking:

Appreciate any help, thanks!